date:20150930

Re: [PATCH net-next 1/2] openvswitch: add tunnel protocol to sw_flow_key

2015-09-30 Thread Jiri Benc

On Tue, 29 Sep 2015 13:41:34 -0700, Pravin Shelar wrote:
> We can add rather add TUNNEL_IPV6 flag to distinguish IPv4 and IPv6
> tunnel keys. This can be stored in ip_tunnel_key.tun_flags.

Not really. This was my original approach, too, but openvswitch is not
the only user of struct ip_tunnel_key, and in the lwtunnel core,
tun_flags are handled in the way that makes this impractical. Most
importantly, the tun_flags value is directly taken from/stored to
LWTUNNEL_IP_FLAGS/LWTUNNEL_IP6_FLAGS netlink attributes in
net/ipv4/ip_tunnel_core.c. This would mean complicated masking, etc.

> That also saves space in flow key.

The field was added to a 2 byte hole in the struct sw_flow_key (leaving
still 1 byte free), thus there's no additional space used.

 Jiri

-- 
Jiri Benc
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [ovs-dev] [PATCH net-next 1/2] openvswitch: add tunnel protocol to sw_flow_key

2015-09-30 Thread Jiri Benc

On Tue, 29 Sep 2015 19:08:44 -0700, Jesse Gross wrote:
> On Tue, Sep 29, 2015 at 10:52 AM, Jiri Benc  wrote:
> > diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
> > index 5c030a4d7338..03ba070c3256 100644
> > --- a/net/openvswitch/flow_netlink.c
> > +++ b/net/openvswitch/flow_netlink.c
> > @@ -643,6 +643,7 @@ static int ipv4_tun_from_nlattr(const struct nlattr 
> > *attr,
> > }
> >
> > SW_FLOW_KEY_PUT(match, tun_key.tun_flags, tun_flags, is_mask);
> > +   SW_FLOW_KEY_PUT(match, tun_proto, AF_INET, is_mask);
> 
> I don't think this is right in the case of the mask. It will cause the
> the mask to be the value AF_INET - instead you want to set the mask to
> be 0xff.

I think you're right, this is a special case. I'll fix it.

Thanks,

 Jiri

-- 
Jiri Benc
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/5] ntp/pps: use timespec64 for hardpps()

2015-09-30 Thread Thomas Gleixner

On Mon, 28 Sep 2015, Arnd Bergmann wrote:

> There is only one user of the hardpps function in the kernel, so
> it makes sense to atomically change it over to using 64-bit
> timestamps for y2038 safety. In the hardpps implementation,
> we also need to change the pps_normtime structure, which is
> similar to struct timespec and also requires a 64-bit
> seconds portion.
> 
> This introduces two temporary variables in pps_kc_event() to
> do the conversion, they will be removed again in the next step,
> which seemed preferable to having a larger patch changing it
> all at the same time.
> 
> Signed-off-by: Arnd Bergmann 

Reviewed-by: Thomas Gleixner 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/5] ntp/pps: replace getnstime_raw_and_real with 64-bit version

2015-09-30 Thread Thomas Gleixner

On Mon, 28 Sep 2015, Arnd Bergmann wrote:

> There is exactly one caller of getnstime_raw_and_real in the kernel,
> which is the pps_get_ts function. This changes the caller and
> the implementation to work on timespec64 types rather than timespec,
> to avoid the time_t overflow on 32-bit architectures.
> 
> For consistency with the other new functions (ktime_get_seconds,
> ktime_get_real_*, ...), I'm renaming the function to
> ktime_get_raw_and_real_ts64.
> 
> We still need to convert from the internal 64-bit type to 32 bit
> types in the caller, but this conversion is now pushed out from
> getnstime_raw_and_real to pps_get_ts. A follow-up patch changes
> the remaining pps code to completely avoid the conversion.
> 
> Signed-off-by: Arnd Bergmann 

Reviewed-by: Thomas Gleixner 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/5] ntp: use timespec64 in sync_cmos_clock

2015-09-30 Thread Thomas Gleixner

On Mon, 28 Sep 2015, Arnd Bergmann wrote:
> The sync_cmos_clock has one use of struct timespec, which we want to
> eventually replace with timespec64 or similar in the kernel. There
> is no way this one can overflow, but the conversion to timespec64
> is trivial and has no other dependencies.
> 
> Signed-off-by: Arnd Bergmann 

Reviewed-by: Thomas Gleixner 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC 3/7] netfilter: add NF_INET_LOCAL_SOCKET_IN chain type

2015-09-30 Thread Daniel Mack

On 09/29/2015 11:19 PM, Florian Westphal wrote:
> Daniel Mack  wrote:
>> Add a new chain type NF_INET_LOCAL_SOCKET_IN which is ran after the
>> input demux is complete and the final destination socket (if any)
>> has been determined.
>>
>> This helps filtering packets based on information stored in the
>> destination socket, such as cgroup controller supplied net class IDs.
> 
> This still seems like the 'x y' problem ("want to do X, think Y is
> correct solution; ask about Y, but thats a strange thing to do").
> 
> There is nothing that this offers over INPUT *except* that sk is
> available.  But there is zero benefit as far as I am concerned --
> why would you want to do any meaningful filtering based on the sk at
> that point...?

Well, INPUT and SOCKET_INPUT are just two different tools that help
solve different classes of problems. INPUT is for filtering all local
traffic while SOCKET_INPUT is just for such that actually has a
listener, and they both make sense in different scenarios.

> Drop?  Makes no sense, else application would not be running in the first
> place.

Of course you can drop certain packets at this point, depending on other
details. Say, for instance, you want to match all packets that are
received by a certain task and that are originated from IP addresses of
a specific subnet, and drop the rest. Rather than adding matches to your
global firewall configuration for all the ports that tasks may or may
not listen on, you can just do it on a higher level, from the
perspective of an administrator. If you decide to let your web server
listen on another port as well, no firewall rule configuration change is
needed at all.

Another use case is accounting. If you want to know how much traffic a
certain service or application in your system has caused, you don't want
to match all its ports to firewall rules just in order to get that
information. Instead, you can now derive that information on a
per-application base. With this patch set, this even works just fine for
multicast listeners, which is something that is currently impossible to
achieve otherwise.

> So the only 'benefit' is that netcls id is available; but
> a) why is that even needed and

It's currently the only way of realizing application-level firewalls,
and it'd be an awesome feature if it actually worked.

> b) is such a huge sledgehammer just for net cgroup accounting
> worth it?

I really don't know if this approach is intrusive enough to make it
qualify as sledgehammer. I'd like to see some real-world benchmarks and
have proof there is a performance decrease for setups that don't use
such chains.

> Another question is what other strange things come up once we would
> open this door.

So let's discuss the possible drawbacks.

Again, the deal with this new chain type is simple: if there is no local
listener, the rules are not looked at. If you need rules that are
processed either way, put them in LOCAL_IN, as you always did.

>> listening on a specific task, the resulting error code that is sent
>> back to the remote peer can't be controlled with rules in
>> NF_INET_LOCAL_SOCKET_IN chains.
> 
> Right, and that makes this even weirder.

Well, to be more specific: you can only control the resulting error code
that is sent back to the remote peer _if_ there is a local listener. You
can do _anything_ _if_ there is a local listener. This is in line with
the above description and shouldn't cause much surprises for users.

> For deterministic ingress filtering you can only rely on what
> is contained in the packet.

Why so? For deterministic ingress filtering of traffic directed to a
local socket, you can as well rely on information associated with that
socket. And this is what application-level firewall rule sets are all about.

Daniel

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 4/5] ntp/pps: use y2038 safe types in pps_event_time

2015-09-30 Thread Thomas Gleixner

On Mon, 28 Sep 2015, Arnd Bergmann wrote:

> The pps_event_time uses two 'timespec' structures internally, which
> suffer from the y2038 problem. The uses of this structure are
> fairly self-contained in the pps code, so this replaces them all at
> once.
> 
> Unfortunately, this includes the sfc ethernet driver aside from the
> pps subsystem, so we change that one as well. Both touch the
> same data structure, and there probably is no good way to split
> the patch into smaller units.
> 
> Signed-off-by: Arnd Bergmann 

Reviewed-by: Thomas Gleixner 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 5/5] net: sfc: avoid using timespec

2015-09-30 Thread Thomas Gleixner

On Mon, 28 Sep 2015, Arnd Bergmann wrote:

> The sfc driver internally uses a time format based on 32-bit (unsigned)
> seconds and 32-bit nanoseconds. This means it will overflow in 2106,
> but the value we pass into it is a signed 32-bit tv_sec that already
> overflows in 2038 to a negative value.
> 
> This patch changes the logic to use the lower 32 bits of the timespec64
> tv_sec in efx_ptp_ns_to_s_ns, which will have the correct value beyond the 
> overflow.
> While this does not change any of the register values, it lets us
> keep using the driver after we deprecate the use of the timespec type
> in the kernel.
> 
> In the efx_ptp_process_times function, the change to use timespec64
> is similar, in that the tv_sec portion is ignored anyway and we only
> care about the nanosecond portion that remains unchanged.
> 
> Signed-off-by: Arnd Bergmann 

Reviewed-by: Thomas Gleixner 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/5] y2038 conversion for ntp/pps and sfc driver

2015-09-30 Thread Thomas Gleixner

On Tue, 29 Sep 2015, David Miller wrote:
> From: Arnd Bergmann 
> Date: Mon, 28 Sep 2015 22:21:27 +0200
> 
> > When trying to build a kernel with time_t commented out, I found that
> > the ntp subsystem still relies on timespec for its pps handling.
> > 
> > This series addresses this and converts all the code to use timespec64
> > instead, step by step. There is one device driver that interacts with
> > this code directly (rather than only through the ptp subsystem), so
> > I have to convert that driver at the same time.
> > 
> > The patches should ideally stay together as a series, but they do
> > span multiple subsystems, so I'm also looking for the right person
> > to merge them.
> 
> I'm happy with this going via a tree other than mine, and for the

I think it should go via John Stultz timekeeping tree. 

Thanks,

tglx
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next 2/2] openvswitch: netlink attributes for IPv6 tunneling

2015-09-30 Thread Jiri Benc

On Tue, 29 Sep 2015 20:05:00 -0700, Jesse Gross wrote:
> This appears to me to be a bug in the existing code.
> ovs_tunnel_get_egress_info() as a general mechanism is still in use
> and should work with both the old and new configuration methods.

It's currently used only from the compat layer (the API that the user
space that is unaware of lwtunnels use).

I don't understand what it would be good for with lwtunnel based
tunnels. The metadata_dst is created in the validate_and_copy_set_tun
function (net/openvswitch/flow_netlink.c) and used to specify egress
encapsulation metadata. The ovs_tunnel_get_egress_info function is not
needed.

> However, I agree that it doesn't look like it will work currently with
> tunnel devices. I think we need to fix this rather than making it more
> broken.

I'm not making it more broken. We currently (i.e. right now, in the
current net.git) have two APIs for tunnel specification in the ovs
kernel datapath: the old one, which is translated by the compat layer
to create a net_device, and the lwtunnel one, which requires user space
to create a (metadata) tunnel net_device and add it to the datapath.
I'm simply not adding more code to the first, legacy interface, which
seems to be the correct thing to do.

 Jiri

-- 
Jiri Benc
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: List corruption on epoll_ctl(EPOLL_CTL_DEL) an AF_UNIX socket

2015-09-30 Thread Michal Kubecek

On Wed, Sep 30, 2015 at 07:54:29AM +0200, Mathias Krause wrote:
> On 29 September 2015 at 21:09, Jason Baron  wrote:
> > However, if we call connect on socket 's', to connect to a new socket 'o2', 
> > we
> > drop the reference on the original socket 'o'. Thus, we can now close socket
> > 'o' without unregistering from epoll. Then, when we either close the ep
> > or unregister 'o', we end up with this list corruption. Thus, this is not a
> > race per se, but can be triggered sequentially.
> 
> Sounds profound, but the reproducers calls connect only once per
> socket. So there is no "connect to a new socket", no?

I believe there is another scenario: 'o' becomes SOCK_DEAD while 's' is
still connected to it. This is detected by 's' in unix_dgram_sendmsg()
so that 's' releases its reference on 'o' and 'o' can be freed. If this
happens before 's' is unregistered, we get use-after-free as 'o' has
never been unregistered. And as the interval between freeing 'o' and
unregistering 's' can be quite long, there is a chance for the memory to
be reused. This is what one of our customers has seen:

[exception RIP: _raw_spin_lock_irqsave+156]
RIP: 8040f5bc  RSP: 8800e929de78  RFLAGS: 00010082
RAX: a32c  RBX: 88003954ab80  RCX: 1000
RDX: f232  RSI: f232  RDI: 88003954ab80
RBP: 5220   R8: dead00100100   R9: 
R10: 7fff1a284960  R11: 0246  R12: 
R13: 8800e929de8c  R14: 000e  R15: 
ORIG_RAX:   CS: 1e030  SS: e02b
 #8 [8800e929de70] _raw_spin_lock_irqsave at 8040f5a9
 #9 [8800e929deb0] remove_wait_queue at 8006ad09
#10 [8800e929ded0] ep_unregister_pollwait at 80170043
#11 [8800e929def0] ep_remove at 80170073
#12 [8800e929df10] sys_epoll_ctl at 80171453
#13 [8800e929df80] system_call_fastpath at 80417553

In this case, crash happened on unregistering 's' which had null peer
(i.e. not reconnected but rather disconnected) but there were still two
items in the list, the other pointing to an unallocated page which has
apparently been modified in between.

IMHO unix_dgram_disonnected() could be the place to handle this issue:
it is called from both places where we disconnect from a peer (dead peer
detection in unix_dgram_sendmsg() and reconnect in unix_dgram_connect())
just before the reference to peer is released. I'm not familiar with the
epoll implementation so I'm still trying to find what exactly needs to
be done to unregister the peer at this moment.

> That bug triggers since commit 3c73419c09 "af_unix: fix 'poll for
> write'/ connected DGRAM sockets". That's v2.6.26-rc7, as noted in the
> reproducer.

Sounds likely as this is the commit that introduced unix_dgram_poll()
with the code which adds the "asymmetric peer" to monitor its queue
state. More precisely, the asymmetricity check has been added by

  ec0d215f9420 ("af_unix: fix 'poll for write'/connected DGRAM sockets")

shortly after that.

  Michal Kubecek

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC 3/7] netfilter: add NF_INET_LOCAL_SOCKET_IN chain type

2015-09-30 Thread Jan Engelhardt


On Wednesday 2015-09-30 09:24, Daniel Mack wrote:
>
>> Drop?  Makes no sense, else application would not be running in the first
>> place.
>
>Of course you can drop certain packets at this point, depending on other
>details. Say, for instance, you want to match all packets that are
>received by a certain task [...]
>Another use case is accounting. If you want to know how much traffic a
>certain service or application in your system has caused

But the sk info would be available in INPUT already, would it not?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v5 net-next 0/2] ipv4: Hash-based multipath routing

2015-09-30 Thread Peter Nørlund

When the routing cache was removed in 3.6, the IPv4 multipath algorithm changed
from more or less being destination-based into being quasi-random per-packet
scheduling. This increases the risk of out-of-order packets and makes it
impossible to use multipath together with anycast services.

This patch series replaces the old implementation with flow-based load
balancing based on a hash over the source and destination addresses.

Distribution of the hash is done with thresholds as described in RFC 2992.
This reduces the disruption when a path is added/remove when having more than
two paths.

To futher the chance of successful usage in conjuction with anycast, ICMP
error packets are hashed over the inner IP addresses. This ensures that PMTU
will work together with anycast or load-balancers such as IPVS.

Port numbers are not considered since fragments could cause problems with
anycast and IPVS. Relying on the DF-flag for TCP packets is also insufficient,
since ICMP inspection effectively extracts information from the opposite
flow which might have a different state of the DF-flag. This is also why the
RSS hash is not used. These are typically based on the NDIS RSS spec which
mandates TCP support.

Measurements of the additional overhead of a two-path multipath
(p_mkroute_input excl. __mkroute_input) on a Xeon X3550 (4 cores, 2.66GHz):

Original per-packet: ~394 cycles/packet
L3 hash:  ~76 cycles/packet

Changes in v5:
- Fixed compilation error

Changes in v4:
- Functions take hash directly instead of func ptr
- Added inline hash function
- Added dummy macros to minimize ifdefs
- Use upper 31 bits of hash instead of lower

Changes in v3:
- Multipath algorithm is no longer configurable (always L3)
- Added random seed to hash
- Moved ICMP inspection to isolated function
- Ignore source quench packets (deprecated as per RFC 6633)

Changes in v2:
- Replaced 8-bit xor hash with 31-bit jenkins hash
- Don't scale weights (since 31-bit)
- Avoided unnecesary renaming of variables
- Rely on DF-bit instead of fragment offset when checking for fragmentation
- upper_bound is now inclusive to avoid overflow
- Use a callback to postpone extracting flow information until necessary
- Skipped ICMP inspection entirely with L4 hashing
- Handle newly added sysctl ignore_routes_with_linkdown

Best Regards
 Peter Nørlund


Peter Nørlund (2):
  ipv4: L3 hash-based multipath
  ipv4: ICMP packet inspection for multipath


 include/net/ip_fib.h |   14 -
 include/net/route.h  |   11 +++-
 net/ipv4/fib_semantics.c |  140 ++
 net/ipv4/icmp.c  |   19 +-
 net/ipv4/route.c |   65 ++--
 5 files changed, 173 insertions(+), 76 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v5 net-next 2/2] ipv4: ICMP packet inspection for multipath

2015-09-30 Thread Peter Nørlund

From: Peter Nørlund 

ICMP packets are inspected to let them route together with the flow they
belong to, minimizing the chance that a problematic path will affect flows
on other paths, and so that anycast environments can work with ECMP.

Signed-off-by: Peter Nørlund 
---
 include/net/route.h |   11 +-
 net/ipv4/icmp.c |   19 -
 net/ipv4/route.c|   59 +--
 3 files changed, 80 insertions(+), 9 deletions(-)

diff --git a/include/net/route.h b/include/net/route.h
index f46af25..7d79c05 100644
--- a/include/net/route.h
+++ b/include/net/route.h
@@ -28,6 +28,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -110,7 +111,15 @@ struct in_device;
 int ip_rt_init(void);
 void rt_cache_flush(struct net *net);
 void rt_flush_dev(struct net_device *dev);
-struct rtable *__ip_route_output_key(struct net *, struct flowi4 *flp);
+struct rtable *__ip_route_output_key_hash(struct net *, struct flowi4 *flp,
+ int mp_hash);
+
+static inline struct rtable *__ip_route_output_key(struct net *net,
+  struct flowi4 *flp)
+{
+   return __ip_route_output_key_hash(net, flp, -1);
+}
+
 struct rtable *ip_route_output_flow(struct net *, struct flowi4 *flp,
struct sock *sk);
 struct dst_entry *ipv4_blackhole_route(struct net *net,
diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index e5eb8ac..b3a1620 100644
--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -440,6 +440,22 @@ out_unlock:
icmp_xmit_unlock(sk);
 }
 
+#ifdef CONFIG_IP_ROUTE_MULTIPATH
+
+/* Source and destination is swapped. See ip_multipath_icmp_hash */
+static int icmp_multipath_hash_skb(const struct sk_buff *skb)
+{
+   const struct iphdr *iph = ip_hdr(skb);
+
+   return fib_multipath_hash(iph->daddr, iph->saddr);
+}
+
+#else
+
+#define icmp_multipath_hash_skb(skb) (-1)
+
+#endif
+
 static struct rtable *icmp_route_lookup(struct net *net,
struct flowi4 *fl4,
struct sk_buff *skb_in,
@@ -464,7 +480,8 @@ static struct rtable *icmp_route_lookup(struct net *net,
fl4->flowi4_oif = vrf_master_ifindex(skb_in->dev);
 
security_skb_classify_flow(skb_in, flowi4_to_flowi(fl4));
-   rt = __ip_route_output_key(net, fl4);
+   rt = __ip_route_output_key_hash(net, fl4,
+   icmp_multipath_hash_skb(skb_in));
if (IS_ERR(rt))
return rt;
 
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 64367f3..a2479a4 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1646,6 +1646,48 @@ out:
return err;
 }
 
+#ifdef CONFIG_IP_ROUTE_MULTIPATH
+
+/* To make ICMP packets follow the right flow, the multipath hash is
+ * calculated from the inner IP addresses in reverse order.
+ */
+static int ip_multipath_icmp_hash(struct sk_buff *skb)
+{
+   const struct iphdr *outer_iph = ip_hdr(skb);
+   struct icmphdr _icmph;
+   const struct icmphdr *icmph;
+   struct iphdr _inner_iph;
+   const struct iphdr *inner_iph;
+
+   if (unlikely((outer_iph->frag_off & htons(IP_OFFSET)) != 0))
+   goto standard_hash;
+
+   icmph = skb_header_pointer(skb, outer_iph->ihl * 4, sizeof(_icmph),
+  &_icmph);
+   if (!icmph)
+   goto standard_hash;
+
+   if (icmph->type != ICMP_DEST_UNREACH &&
+   icmph->type != ICMP_REDIRECT &&
+   icmph->type != ICMP_TIME_EXCEEDED &&
+   icmph->type != ICMP_PARAMETERPROB) {
+   goto standard_hash;
+   }
+
+   inner_iph = skb_header_pointer(skb,
+  outer_iph->ihl * 4 + sizeof(_icmph),
+  sizeof(_inner_iph), &_inner_iph);
+   if (!inner_iph)
+   goto standard_hash;
+
+   return fib_multipath_hash(inner_iph->daddr, inner_iph->saddr);
+
+standard_hash:
+   return fib_multipath_hash(outer_iph->saddr, outer_iph->daddr);
+}
+
+#endif /* CONFIG_IP_ROUTE_MULTIPATH */
+
 static int ip_mkroute_input(struct sk_buff *skb,
struct fib_result *res,
const struct flowi4 *fl4,
@@ -1656,7 +1698,10 @@ static int ip_mkroute_input(struct sk_buff *skb,
if (res->fi && res->fi->fib_nhs > 1) {
int h;
 
-   h = fib_multipath_hash(saddr, daddr);
+   if (unlikely(ip_hdr(skb)->protocol == IPPROTO_ICMP))
+   h = ip_multipath_icmp_hash(skb);
+   else
+   h = fib_multipath_hash(saddr, daddr);
fib_select_multipath(res, h);
}
 #endif
@@ -2042,7 +2087,8 @@ add:
  * Major route resolver routine.
  */
 
-struct rtable *__ip_route_output_key(struct net *net, struct flowi4 *fl4)
+struct rtable *__ip_route_out

[PATCH v5 net-next 1/2] ipv4: L3 hash-based multipath

2015-09-30 Thread Peter Nørlund

From: Peter Nørlund 

Replaces the per-packet multipath with a hash-based multipath using
source and destination address.

Signed-off-by: Peter Nørlund 
---
 include/net/ip_fib.h |   14 -
 net/ipv4/fib_semantics.c |  140 +-
 net/ipv4/route.c |   16 --
 3 files changed, 98 insertions(+), 72 deletions(-)

diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
index 727d6e9..7a51fd8 100644
--- a/include/net/ip_fib.h
+++ b/include/net/ip_fib.h
@@ -79,7 +79,7 @@ struct fib_nh {
unsigned char   nh_scope;
 #ifdef CONFIG_IP_ROUTE_MULTIPATH
int nh_weight;
-   int nh_power;
+   atomic_tnh_upper_bound;
 #endif
 #ifdef CONFIG_IP_ROUTE_CLASSID
__u32   nh_tclassid;
@@ -118,7 +118,7 @@ struct fib_info {
 #define fib_advmss fib_metrics[RTAX_ADVMSS-1]
int fib_nhs;
 #ifdef CONFIG_IP_ROUTE_MULTIPATH
-   int fib_power;
+   int fib_weight;
 #endif
struct rcu_head rcu;
struct fib_nh   fib_nh[0];
@@ -320,7 +320,15 @@ int ip_fib_check_default(__be32 gw, struct net_device 
*dev);
 int fib_sync_down_dev(struct net_device *dev, unsigned long event);
 int fib_sync_down_addr(struct net *net, __be32 local);
 int fib_sync_up(struct net_device *dev, unsigned int nh_flags);
-void fib_select_multipath(struct fib_result *res);
+
+extern u32 fib_multipath_secret __read_mostly;
+
+static inline int fib_multipath_hash(__be32 saddr, __be32 daddr)
+{
+   return jhash_2words(saddr, daddr, fib_multipath_secret) >> 1;
+}
+
+void fib_select_multipath(struct fib_result *res, int hash);
 
 /* Exported by fib_trie.c */
 void fib_trie_init(void);
diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
index 064bd3c..0c49d2f 100644
--- a/net/ipv4/fib_semantics.c
+++ b/net/ipv4/fib_semantics.c
@@ -57,8 +57,7 @@ static unsigned int fib_info_cnt;
 static struct hlist_head fib_info_devhash[DEVINDEX_HASHSIZE];
 
 #ifdef CONFIG_IP_ROUTE_MULTIPATH
-
-static DEFINE_SPINLOCK(fib_multipath_lock);
+u32 fib_multipath_secret __read_mostly;
 
 #define for_nexthops(fi) { \
int nhsel; const struct fib_nh *nh; \
@@ -532,7 +531,67 @@ errout:
return ret;
 }
 
-#endif
+static void fib_rebalance(struct fib_info *fi)
+{
+   int total;
+   int w;
+   struct in_device *in_dev;
+
+   if (fi->fib_nhs < 2)
+   return;
+
+   total = 0;
+   for_nexthops(fi) {
+   if (nh->nh_flags & RTNH_F_DEAD)
+   continue;
+
+   in_dev = __in_dev_get_rcu(nh->nh_dev);
+
+   if (in_dev &&
+   IN_DEV_IGNORE_ROUTES_WITH_LINKDOWN(in_dev) &&
+   nh->nh_flags & RTNH_F_LINKDOWN)
+   continue;
+
+   total += nh->nh_weight;
+   } endfor_nexthops(fi);
+
+   w = 0;
+   change_nexthops(fi) {
+   int upper_bound;
+
+   in_dev = __in_dev_get_rcu(nexthop_nh->nh_dev);
+
+   if (nexthop_nh->nh_flags & RTNH_F_DEAD) {
+   upper_bound = -1;
+   } else if (in_dev &&
+  IN_DEV_IGNORE_ROUTES_WITH_LINKDOWN(in_dev) &&
+  nexthop_nh->nh_flags & RTNH_F_LINKDOWN) {
+   upper_bound = -1;
+   } else {
+   w += nexthop_nh->nh_weight;
+   upper_bound = DIV_ROUND_CLOSEST(2147483648LL * w,
+   total) - 1;
+   }
+
+   atomic_set(&nexthop_nh->nh_upper_bound, upper_bound);
+   } endfor_nexthops(fi);
+
+   net_get_random_once(&fib_multipath_secret,
+   sizeof(fib_multipath_secret));
+}
+
+static inline void fib_add_weight(struct fib_info *fi,
+ const struct fib_nh *nh)
+{
+   fi->fib_weight += nh->nh_weight;
+}
+
+#else /* CONFIG_IP_ROUTE_MULTIPATH */
+
+#define fib_rebalance(fi) do { } while (0)
+#define fib_add_weight(fi, nh) do { } while (0)
+
+#endif /* CONFIG_IP_ROUTE_MULTIPATH */
 
 static int fib_encap_match(struct net *net, u16 encap_type,
   struct nlattr *encap,
@@ -1094,8 +1153,11 @@ struct fib_info *fib_create_info(struct fib_config *cfg)
 
change_nexthops(fi) {
fib_info_update_nh_saddr(net, nexthop_nh);
+   fib_add_weight(fi, nexthop_nh);
} endfor_nexthops(fi)
 
+   fib_rebalance(fi);
+
 link_it:
ofi = fib_find_info(fi);
if (ofi) {
@@ -1317,12 +1379,6 @@ int fib_sync_down_dev(struct net_device *dev, unsigned 
long event)
nexthop_nh->nh_flags |= RTNH_F_LINKDOWN;
break;

[RFC PATCH 1/3] net: dsa: Use devm_ prefixed allocations

2015-09-30 Thread Neil Armstrong

To simplify and prevent memory leakage when unbinding, use
the devm_ memory allocation calls.

Signed-off-by: Neil Armstrong 
---
 net/dsa/dsa.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
index c59fa5d..98f94c2 100644
--- a/net/dsa/dsa.c
+++ b/net/dsa/dsa.c
@@ -305,7 +305,7 @@ static int dsa_switch_setup_one(struct dsa_switch *ds, 
struct device *parent)
if (ret < 0)
goto out;

-   ds->slave_mii_bus = mdiobus_alloc();
+   ds->slave_mii_bus = devm_mdiobus_alloc(parent);
if (ds->slave_mii_bus == NULL) {
ret = -ENOMEM;
goto out;
@@ -400,7 +400,7 @@ dsa_switch_setup(struct dsa_switch_tree *dst, int index,
/*
 * Allocate and initialise switch state.
 */
-   ds = kzalloc(sizeof(*ds) + drv->priv_size, GFP_KERNEL);
+   ds = devm_kzalloc(parent, sizeof(*ds) + drv->priv_size, GFP_KERNEL);
if (ds == NULL)
return ERR_PTR(-ENOMEM);

@@ -883,7 +883,7 @@ static int dsa_probe(struct platform_device *pdev)
goto out;
}

-   dst = kzalloc(sizeof(*dst), GFP_KERNEL);
+   dst = devm_kzalloc(&pdev->dev, sizeof(*dst), GFP_KERNEL);
if (dst == NULL) {
dev_put(dev);
ret = -ENOMEM;
-- 
1.9.1
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC PATCH 3/3] net: dsa: exit probe if no switch were found

2015-09-30 Thread Neil Armstrong

If no switch were found in dsa_setup_dst, return -ENODEV and
exit the dsa_probe cleanly.

Signed-off-by: Neil Armstrong 
---
 net/dsa/dsa.c | 19 +++
 1 file changed, 15 insertions(+), 4 deletions(-)

diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
index 0c104af..6ae1ab9 100644
--- a/net/dsa/dsa.c
+++ b/net/dsa/dsa.c
@@ -844,10 +844,11 @@ static inline void dsa_of_remove(struct device *dev)
 }
 #endif

-static void dsa_setup_dst(struct dsa_switch_tree *dst, struct net_device *dev,
+static int dsa_setup_dst(struct dsa_switch_tree *dst, struct net_device *dev,
  struct device *parent, struct dsa_platform_data *pd)
 {
int i;
+   unsigned configured = 0;

dst->pd = pd;
dst->master_netdev = dev;
@@ -867,9 +868,17 @@ static void dsa_setup_dst(struct dsa_switch_tree *dst, 
struct net_device *dev,
dst->ds[i] = ds;
if (ds->drv->poll_link != NULL)
dst->link_poll_needed = 1;
+
+   ++configured;
}

/*
+* If no switch was found, exit cleanly
+*/
+   if (!configured)
+   return -ENODEV;
+
+   /*
 * If we use a tagging format that doesn't have an ethertype
 * field, make sure that all packets from this point on get
 * sent to the tag format's receive function.
@@ -885,6 +894,8 @@ static void dsa_setup_dst(struct dsa_switch_tree *dst, 
struct net_device *dev,
dst->link_poll_timer.expires = round_jiffies(jiffies + HZ);
add_timer(&dst->link_poll_timer);
}
+
+   return 0;
 }

 static int dsa_probe(struct platform_device *pdev)
@@ -934,9 +945,9 @@ static int dsa_probe(struct platform_device *pdev)

platform_set_drvdata(pdev, dst);

-   dsa_setup_dst(dst, dev, &pdev->dev, pd);
-
-   return 0;
+   ret = dsa_setup_dst(dst, dev, &pdev->dev, pd);
+   if (!ret)
+   return 0;

 out:
dsa_of_remove(&pdev->dev);
-- 
1.9.1
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC PATCH 2/3] net: dsa: complete dsa_switch_destroy calls

2015-09-30 Thread Neil Armstrong

When unbinding dsa, complete the dsa_switch_destroy to cleanly
destroy and unregister the net and mdio devices.

Signed-off-by: Neil Armstrong 
---
 net/dsa/dsa.c | 42 ++
 1 file changed, 42 insertions(+)

diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
index 98f94c2..0c104af 100644
--- a/net/dsa/dsa.c
+++ b/net/dsa/dsa.c
@@ -22,6 +22,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "dsa_priv.h"

 char dsa_driver_version[] = "0.1";
@@ -420,10 +421,51 @@ dsa_switch_setup(struct dsa_switch_tree *dst, int index,

 static void dsa_switch_destroy(struct dsa_switch *ds)
 {
+   struct device_node *port_dn;
+   struct phy_device *phydev;
+   struct dsa_chip_data *cd = ds->pd;
+   int port;
+
 #ifdef CONFIG_NET_DSA_HWMON
if (ds->hwmon_dev)
hwmon_device_unregister(ds->hwmon_dev);
 #endif
+
+   /* Disable configuration of the CPU and DSA ports */
+   for (port = 0; port < DSA_MAX_PORTS; port++) {
+   if (!(dsa_is_cpu_port(ds, port) || dsa_is_dsa_port(ds, port)))
+   continue;
+
+   port_dn = cd->port_dn[port];
+   if (of_phy_is_fixed_link(port_dn)) {
+   phydev = of_phy_find_device(port_dn);
+   if (phydev) {
+   int addr = phydev->addr;
+   phy_device_free(phydev);
+   of_node_put(port_dn);
+   fixed_phy_del(addr);
+   }
+   }
+   }
+
+   /*
+* Destroy network devices for physical switch ports.
+*/
+   for (port = 0; port < DSA_MAX_PORTS; port++) {
+   if (!(ds->phys_port_mask & (1 << port)))
+   continue;
+
+   if (!ds->ports[port])
+   continue;
+
+   unregister_netdev(ds->ports[port]);
+   free_netdev(ds->ports[port]);
+   }
+
+   /*
+* Do basic unregister.
+*/
+   mdiobus_unregister(ds->slave_mii_bus);
 }

 #ifdef CONFIG_PM_SLEEP
-- 
1.9.1
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC PATCH 0/3] net: dsa: Complete and fix the dsa unbinding

2015-09-30 Thread Neil Armstrong

In order to cleanly unbind the dsa core, either as a module removal,
or a platform device unbind, switch the allocation the their devm_
counterparts and complete the destroy functions.

The last patch is an experimental way to exit the probe when no
switch is found in the discover process.

The patches are based on the current net-next.

Neil Armstrong (3):
  net: dsa: Use devm_ prefixed allocations
  net: dsa: complete dsa_switch_destroy calls
  net: dsa: exit probe if no switch were found

 net/dsa/dsa.c | 67 ---
 1 file changed, 60 insertions(+), 7 deletions(-)

-- 
1.9.1
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next 4/6] xfrm: Add xfrm6 address translation function

2015-09-30 Thread Steffen Klassert

On Tue, Sep 29, 2015 at 04:58:46PM -0600, David Ahern wrote:
> Hi Tom:
> 
> On 9/29/15 4:17 PM, Tom Herbert wrote:
> >This patch adds xfrm6_xlat_addr which is called in the data path
> >to perform address translation (primarily for the receive path). Modules
> >may register their own callback to perform a translation-- this
> >registration is managed by xfrm6_xlat_addr_add and xfrm6_xlat_addr_del.
> >xfrm6_xlat_addr allows translation of addresses for an sk_buff.
> 
> 
> Seems like a stretch to lump this into xfrms. You have a separate
> genl based config as opposed to the netlink xfrm API and you are
> calling the xlat_addr function directly in ip6_rcv as opposed to via
> some policy with dst_ops driven redirection. Why call this a xfrm?

I have to agree here. We have policies and states to do the lookups
and to describe the transformation. Just adding a callback to do this
in a different way does not integrate well into xfrm.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC 3/7] netfilter: add NF_INET_LOCAL_SOCKET_IN chain type

2015-09-30 Thread Daniel Mack

On 09/30/2015 09:40 AM, Jan Engelhardt wrote:
> 
> On Wednesday 2015-09-30 09:24, Daniel Mack wrote:
>>
>>> Drop?  Makes no sense, else application would not be running in the first
>>> place.
>>
>> Of course you can drop certain packets at this point, depending on other
>> details. Say, for instance, you want to match all packets that are
>> received by a certain task [...]
>> Another use case is accounting. If you want to know how much traffic a
>> certain service or application in your system has caused
> 
> But the sk info would be available in INPUT already, would it not?

No, only for established connections, as those are subject to early
demux which sets skb->sk. For all other packets, netfilter callbacks are
called with skb->sk == NULL.

That's the whole point of this patch set ;)


Daniel

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next 5/6] ipv6: Call xfrm6_xlat_addr from ipv6_rcv

2015-09-30 Thread Steffen Klassert

On Tue, Sep 29, 2015 at 03:17:22PM -0700, Tom Herbert wrote:
> Call before performing NF_HOOK and routing in order to perform address
> translation in the receive path.
> 
> Signed-off-by: Tom Herbert 
> ---
>  net/ipv6/ip6_input.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/net/ipv6/ip6_input.c b/net/ipv6/ip6_input.c
> index 9075acf..06dac55 100644
> --- a/net/ipv6/ip6_input.c
> +++ b/net/ipv6/ip6_input.c
> @@ -183,6 +183,9 @@ int ipv6_rcv(struct sk_buff *skb, struct net_device *dev, 
> struct packet_type *pt
>   /* Must drop socket now because of tproxy. */
>   skb_orphan(skb);
>  
> + /* Translate destination address before routing */
> + xfrm6_xlat_addr(skb);
> +

This shows that xfrm is not the right place to add this. The existing
xfrm hooks are located at the same place as your current LWT hooks are.

You could use the existing xfrm hooks similar to xfrm tunnel modes.
This reinserts the transformed packet back into layer2, but I guess
this is not what you want.

I'm currently paying with a GRO codepath for IPsec to get the
packets transformed early. If you can do your address translation
that early, it could be an option too. This clearly depends on
enabled GRO at the receiving device, but you would still have
the LWT hook as a fallback.

>   return NF_HOOK(NFPROTO_IPV6, NF_INET_PRE_ROUTING,
>  net, NULL, skb, dev, NULL,
>  ip6_rcv_finish);

Or, try to use the netfilter hook that seems to be at the right
place at least.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] net ipv4: use preferred log methods

2015-09-30 Thread Bastian Stender

Replace printk calls with preferred unconditional log method calls to keep
kernel messages clean.

Signed-off-by: Bastian Stender 
---
 net/ipv4/ipconfig.c| 53 +-
 net/ipv4/netfilter/arp_tables.c| 17 +
 net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c |  2 +-
 net/ipv4/netfilter/nf_nat_snmp_basic.c | 31 ---
 4 files changed, 36 insertions(+), 67 deletions(-)

diff --git a/net/ipv4/ipconfig.c b/net/ipv4/ipconfig.c
index ed4ef09..b389d8b 100644
--- a/net/ipv4/ipconfig.c
+++ b/net/ipv4/ipconfig.c
@@ -65,15 +65,6 @@
 #include 
 #include 
 
-/* Define this to allow debugging output */
-#undef IPCONFIG_DEBUG
-
-#ifdef IPCONFIG_DEBUG
-#define DBG(x) printk x
-#else
-#define DBG(x) do { } while(0)
-#endif
-
 #if defined(CONFIG_IP_PNP_DHCP)
 #define IPCONFIG_DHCP
 #endif
@@ -223,7 +214,7 @@ static int __init ic_open_devs(void)
if (dev->mtu >= 364)
able |= IC_BOOTP;
else
-   pr_warn("DHCP/BOOTP: Ignoring device %s, MTU %d 
too small",
+   pr_warn("DHCP/BOOTP: Ignoring device %s, MTU %d 
too small\n",
dev->name, dev->mtu);
if (!(dev->flags & IFF_NOARP))
able |= IC_RARP;
@@ -460,7 +451,8 @@ static int __init ic_defaults(void)
   &ic_myaddr);
return -1;
}
-   printk("IP-Config: Guessing netmask %pI4\n", &ic_netmask);
+   pr_notice("IP-Config: Guessing netmask %pI4\n",
+ &ic_netmask);
}
 
return 0;
@@ -671,9 +663,7 @@ ic_dhcp_init_options(u8 *options)
u8 *e = options;
int len;
 
-#ifdef IPCONFIG_DEBUG
-   printk("DHCP: Sending message type %d\n", mt);
-#endif
+   pr_debug("DHCP: Sending message type %d\n", mt);
 
memcpy(e, ic_bootp_cookie, 4);  /* RFC1048 Magic Cookie */
e += 4;
@@ -833,7 +823,8 @@ static void __init ic_bootp_send_if(struct ic_device *d, 
unsigned long jiffies_d
else if (dev->type == ARPHRD_FDDI)
b->htype = ARPHRD_ETHER;
else {
-   printk("Unknown ARP type 0x%04x for device %s\n", dev->type, 
dev->name);
+   pr_warn("Unknown ARP type 0x%04x for device %s\n", dev->type,
+   dev->name);
b->htype = dev->type; /* can cause undefined behavior */
}
 
@@ -857,12 +848,12 @@ static void __init ic_bootp_send_if(struct ic_device *d, 
unsigned long jiffies_d
if (dev_hard_header(skb, dev, ntohs(skb->protocol),
dev->broadcast, dev->dev_addr, skb->len) < 0) {
kfree_skb(skb);
-   printk("E");
+   pr_alert("E\n");
return;
}
 
if (dev_queue_xmit(skb) < 0)
-   printk("E");
+   pr_alert("E\n");
 }
 
 
@@ -890,14 +881,12 @@ static void __init ic_do_bootp_ext(u8 *ext)
int i;
__be16 mtu;
 
-#ifdef IPCONFIG_DEBUG
u8 *c;
 
-   printk("DHCP/BOOTP: Got extension %d:",*ext);
+   pr_debug("DHCP/BOOTP: Got extension %d:", *ext);
for (c=ext+2; cyour_ip;
ic_servaddr = server_id;
-#ifdef IPCONFIG_DEBUG
-   printk("DHCP: Offered address %pI4 by server 
%pI4\n",
-  &ic_myaddr, &b->iph.saddr);
-#endif
+   pr_debug("DHCP: Offered address %pI4 by server 
%pI4\n",
+&ic_myaddr, &b->iph.saddr);
/* The DHCP indicated server address takes
 * precedence over the bootp header one if
 * they are different.
@@ -1264,7 +1249,7 @@ static int __init ic_dynamic(void)
if (timeout > CONF_TIMEOUT_MAX)
timeout = CONF_TIMEOUT_MAX;
 
-   pr_cont(".");
+   pr_cont(".\n");
}
 
 #ifdef IPCONFIG_BOOTP
@@ -1281,10 +1266,10 @@ static int __init ic_dynamic(void)
return -1;
}
 
-   printk("IP-Config: Got %s answer from %pI4, ",
+   pr_info("IP-Config: Got %s answer from %pI4, ",
((ic_got_reply & IC_RARP) ? "RARP"
-: (ic_proto_enabled & IC_USE_DHCP) ? "DHCP" : "BOOTP"),
-  &ic_addrservaddr);
+   : (ic_proto_enabled & IC_USE_DHCP) ? "DHCP" : "BOOTP"),
+   &ic_addrservaddr);
pr_cont("my address is %pI4\n", &ic_myaddr);
 
return 0;
@@ -1686,7 +1671,7 @@ static int __init vendor_class_identifier_setup(char 
*addrs)
if (strlcpy(vendor_class_identifier, addrs,
sizeof(vendor_class_identifier))
>= sizeof(vendor_class_identifier))
-

Re: [PATCH] net ipv4: use preferred log methods

2015-09-30 Thread kbuild test robot

Hi Bastian,

[auto build test results on v4.3-rc3 -- if it's inappropriate base, please 
ignore]

config: mips-jz4740 (attached as .config)
reproduce:
  wget 
https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
 -O ~/bin/make.cross
  chmod +x ~/bin/make.cross
  git checkout 5b4b43e3d9b6dcebef0324965111b8e5a8bcd6e8
  # save the attached .config to linux build tree
  make.cross ARCH=mips 

All error/warnings (new ones prefixed by >>):

   net/ipv4/ipconfig.c: In function 'ic_open_devs':
>> net/ipv4/ipconfig.c:244:4: error: implicit declaration of function 'DBG' 
>> [-Werror=implicit-function-declaration]
   DBG(("IP-Config: %s UP (able=%d, xid=%08x)\n",
   ^
>> net/ipv4/ipconfig.c:244:49: warning: left-hand operand of comma expression 
>> has no effect [-Wunused-value]
   DBG(("IP-Config: %s UP (able=%d, xid=%08x)\n",
^
   net/ipv4/ipconfig.c:245:14: warning: left-hand operand of comma expression 
has no effect [-Wunused-value]
dev->name, able, d->xid));
 ^
   net/ipv4/ipconfig.c:245:20: warning: left-hand operand of comma expression 
has no effect [-Wunused-value]
dev->name, able, d->xid));
   ^
   net/ipv4/ipconfig.c: In function 'ic_close_devs':
   net/ipv4/ipconfig.c:301:34: warning: left-hand operand of comma expression 
has no effect [-Wunused-value]
   DBG(("IP-Config: Downing %s\n", dev->name));
 ^
   net/ipv4/ipconfig.c: In function 'ip_auto_config_setup':
   net/ipv4/ipconfig.c:1602:43: warning: left-hand operand of comma expression 
has no effect [-Wunused-value]
   DBG(("IP-Config: Parameter #%d: `%s'\n", num, ip));
  ^
   net/ipv4/ipconfig.c:1602:48: warning: left-hand operand of comma expression 
has no effect [-Wunused-value]
   DBG(("IP-Config: Parameter #%d: `%s'\n", num, ip));
   ^
   cc1: some warnings being treated as errors

vim +/DBG +244 net/ipv4/ipconfig.c

^1da177e Linus Torvalds 2005-04-16  238 d->able = able;
^1da177e Linus Torvalds 2005-04-16  239 if (able & 
IC_BOOTP)
5a874db4 Al Viro2006-11-08  240 
get_random_bytes(&d->xid, sizeof(__be32));
^1da177e Linus Torvalds 2005-04-16  241 else
^1da177e Linus Torvalds 2005-04-16  242 d->xid 
= 0;
^1da177e Linus Torvalds 2005-04-16  243 
ic_proto_have_if |= able;
^1da177e Linus Torvalds 2005-04-16 @244 
DBG(("IP-Config: %s UP (able=%d, xid=%08x)\n",
^1da177e Linus Torvalds 2005-04-16  245 
dev->name, able, d->xid));
^1da177e Linus Torvalds 2005-04-16  246 }
^1da177e Linus Torvalds 2005-04-16  247 }

:: The code at line 244 was first introduced by commit
:: 1da177e4c3f41524e886b7f1b8a0c1fc7321cac2 Linux-2.6.12-rc2

:: TO: Linus Torvalds 
:: CC: Linus Torvalds 

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data

Re: [PATCH] net ipv4: use preferred log methods

2015-09-30 Thread kbuild test robot

Hi Bastian,

[auto build test results on v4.3-rc3 -- if it's inappropriate base, please 
ignore]

config: mips-fuloong2e_defconfig (attached as .config)
reproduce:
  wget 
https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
 -O ~/bin/make.cross
  chmod +x ~/bin/make.cross
  git checkout 5b4b43e3d9b6dcebef0324965111b8e5a8bcd6e8
  # save the attached .config to linux build tree
  make.cross ARCH=mips 

All error/warnings (new ones prefixed by >>):

   net/ipv4/netfilter/arp_tables.c: In function 'arp_packet_match':
>> net/ipv4/netfilter/arp_tables.c:102:3: error: implicit declaration of 
>> function 'dprintf' [-Werror=implicit-function-declaration]
  dprintf("ARP operation field mismatch.\n");
  ^
   net/ipv4/netfilter/arp_tables.c: In function 'arp_checkentry':
>> net/ipv4/netfilter/arp_tables.c:194:3: error: implicit declaration of 
>> function 'duprintf' [-Werror=implicit-function-declaration]
  duprintf("Unknown flag bits set: %08X\n",
  ^
   cc1: some warnings being treated as errors

vim +/dprintf +102 net/ipv4/netfilter/arp_tables.c

ddc214c4 Eric Dumazet2009-02-18   96long ret;
^1da177e Linus Torvalds  2005-04-16   97  
e79ec50b Jan Engelhardt  2007-12-17   98  #define FWINV(bool, invflg) ((bool) ^ 
!!(arpinfo->invflags & (invflg)))
^1da177e Linus Torvalds  2005-04-16   99  
^1da177e Linus Torvalds  2005-04-16  100if (FWINV((arphdr->ar_op & 
arpinfo->arpop_mask) != arpinfo->arpop,
^1da177e Linus Torvalds  2005-04-16  101  ARPT_INV_ARPOP)) {
^1da177e Linus Torvalds  2005-04-16 @102dprintf("ARP operation 
field mismatch.\n");
^1da177e Linus Torvalds  2005-04-16  103dprintf("ar_op: %04x 
info->arpop: %04x info->arpop_mask: %04x\n",
^1da177e Linus Torvalds  2005-04-16  104arphdr->ar_op, 
arpinfo->arpop, arpinfo->arpop_mask);
^1da177e Linus Torvalds  2005-04-16  105return 0;
^1da177e Linus Torvalds  2005-04-16  106}
^1da177e Linus Torvalds  2005-04-16  107  
^1da177e Linus Torvalds  2005-04-16  108if (FWINV((arphdr->ar_hrd & 
arpinfo->arhrd_mask) != arpinfo->arhrd,
^1da177e Linus Torvalds  2005-04-16  109  ARPT_INV_ARPHRD)) {
^1da177e Linus Torvalds  2005-04-16  110dprintf("ARP hardware 
address format mismatch.\n");
^1da177e Linus Torvalds  2005-04-16  111dprintf("ar_hrd: %04x 
info->arhrd: %04x info->arhrd_mask: %04x\n",
^1da177e Linus Torvalds  2005-04-16  112arphdr->ar_hrd, 
arpinfo->arhrd, arpinfo->arhrd_mask);
^1da177e Linus Torvalds  2005-04-16  113return 0;
^1da177e Linus Torvalds  2005-04-16  114}
^1da177e Linus Torvalds  2005-04-16  115  
^1da177e Linus Torvalds  2005-04-16  116if (FWINV((arphdr->ar_pro & 
arpinfo->arpro_mask) != arpinfo->arpro,
^1da177e Linus Torvalds  2005-04-16  117  ARPT_INV_ARPPRO)) {
^1da177e Linus Torvalds  2005-04-16  118dprintf("ARP protocol 
address format mismatch.\n");
^1da177e Linus Torvalds  2005-04-16  119dprintf("ar_pro: %04x 
info->arpro: %04x info->arpro_mask: %04x\n",
^1da177e Linus Torvalds  2005-04-16  120arphdr->ar_pro, 
arpinfo->arpro, arpinfo->arpro_mask);
^1da177e Linus Torvalds  2005-04-16  121return 0;
^1da177e Linus Torvalds  2005-04-16  122}
^1da177e Linus Torvalds  2005-04-16  123  
^1da177e Linus Torvalds  2005-04-16  124if (FWINV((arphdr->ar_hln & 
arpinfo->arhln_mask) != arpinfo->arhln,
^1da177e Linus Torvalds  2005-04-16  125  ARPT_INV_ARPHLN)) {
^1da177e Linus Torvalds  2005-04-16  126dprintf("ARP hardware 
address length mismatch.\n");
^1da177e Linus Torvalds  2005-04-16  127dprintf("ar_hln: %02x 
info->arhln: %02x info->arhln_mask: %02x\n",
^1da177e Linus Torvalds  2005-04-16  128arphdr->ar_hln, 
arpinfo->arhln, arpinfo->arhln_mask);
^1da177e Linus Torvalds  2005-04-16  129return 0;
^1da177e Linus Torvalds  2005-04-16  130}
^1da177e Linus Torvalds  2005-04-16  131  
^1da177e Linus Torvalds  2005-04-16  132src_devaddr = arpptr;
^1da177e Linus Torvalds  2005-04-16  133arpptr += dev->addr_len;
^1da177e Linus Torvalds  2005-04-16  134memcpy(&src_ipaddr, arpptr, 
sizeof(u32));
^1da177e Linus Torvalds  2005-04-16  135arpptr += sizeof(u32);
^1da177e Linus Torvalds  2005-04-16  136tgt_devaddr = arpptr;
^1da177e Linus Torvalds  2005-04-16  137arpptr += dev->addr_len;
^1da177e Linus Torvalds  2005-04-16  138memcpy(&tgt_ipaddr, arpptr, 
sizeof(u32));
^1da177e Linus Torvalds  2005-04-16  139  
^1da177e Linus Torvalds  2005-04-16  140if 
(FWINV(arp_devaddr_compare(&arpinfo->src_devaddr, src_devaddr, dev->addr_len),
^1da177e Linus Torvalds  2005-04-16  141  ARPT_INV_SRCDEVADDR) 
||
^1da177e Linus Torvalds  200

Re: [PATCH 1/1] xfrm: Fix state threshold configuration from userspace

2015-09-30 Thread Steffen Klassert

On Tue, Sep 29, 2015 at 11:25:08AM +0200, Michael Rossberg wrote:
> Allow to change the replay threshold (XFRMA_REPLAY_THRESH) and expiry
> timer (XFRMA_ETIMER_THRESH) of a state without having to set other
> attributes like replay counter and byte lifetime. Changing these other
> values while traffic flows will break the state.
> 
> Signed-off-by: Michael Rossberg 

Applied to the ipsec tree, thanks Michael!
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net] ppp: don't override sk->sk_state in pppoe_flush_dev()

2015-09-30 Thread Guillaume Nault

Since commit 2b018d57ff18 ("pppoe: drop PPPOX_ZOMBIEs in pppoe_release"),
pppoe_release() calls dev_put(po->pppoe_dev) if sk is in the
PPPOX_ZOMBIE state. But pppoe_flush_dev() can set sk->sk_state to
PPPOX_ZOMBIE _and_ reset po->pppoe_dev to NULL. This leads to the
following oops:

[  570.140800] BUG: unable to handle kernel NULL pointer dereference at 
04e0
[  570.142931] IP: [] pppoe_release+0x50/0x101 [pppoe]
[  570.144601] PGD 3d119067 PUD 3dbc1067 PMD 0
[  570.144601] Oops:  [#1] SMP
[  570.144601] Modules linked in: l2tp_ppp l2tp_netlink l2tp_core 
ip6_udp_tunnel udp_tunnel pppoe pppox ppp_generic slhc loop crc32c_intel 
ghash_clmulni_intel jitterentropy_rng sha256_generic hmac drbg ansi_cprng 
aesni_intel aes_x86_64 ablk_helper cryptd lrw gf128mul glue_helper acpi_cpufreq 
evdev serio_raw processor button ext4 crc16 mbcache jbd2 virtio_net virtio_blk 
virtio_pci virtio_ring virtio
[  570.144601] CPU: 1 PID: 15738 Comm: ppp-apitest Not tainted 4.2.0 #1
[  570.144601] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
Debian-1.8.2-1 04/01/2014
[  570.144601] task: 88003d30d600 ti: 880036b6 task.ti: 
880036b6
[  570.144601] RIP: 0010:[]  [] 
pppoe_release+0x50/0x101 [pppoe]
[  570.144601] RSP: 0018:880036b63e08  EFLAGS: 00010202
[  570.144601] RAX:  RBX: 88003434 RCX: 0206
[  570.144601] RDX: 0006 RSI: 88003d30dd20 RDI: 88003d30dd20
[  570.144601] RBP: 880036b63e28 R08: 0001 R09: 
[  570.144601] R10: 7ffee9b50420 R11: 880034340078 R12: 8800387ec780
[  570.144601] R13: 8800387ec7b0 R14: 88003e222aa0 R15: 8800387ec7b0
[  570.144601] FS:  7f5672f48700() GS:88003fc8() 
knlGS:
[  570.144601] CS:  0010 DS:  ES:  CR0: 80050033
[  570.144601] CR2: 04e0 CR3: 37f7e000 CR4: 000406a0
[  570.144601] Stack:
[  570.144601]  a018f240 8800387ec780 a018f240 
8800387ec7b0
[  570.144601]  880036b63e48 812caabe 880039e4e000 
0008
[  570.144601]  880036b63e58 812cabad 880036b63ea8 
811347f5
[  570.144601] Call Trace:
[  570.144601]  [] sock_release+0x1a/0x75
[  570.144601]  [] sock_close+0xd/0x11
[  570.144601]  [] __fput+0xff/0x1a5
[  570.144601]  [] fput+0x9/0xb
[  570.144601]  [] task_work_run+0x66/0x90
[  570.144601]  [] prepare_exit_to_usermode+0x8c/0xa7
[  570.144601]  [] syscall_return_slowpath+0x16d/0x19b
[  570.144601]  [] int_ret_from_sys_call+0x25/0x9f
[  570.144601] Code: 48 8b 83 c8 01 00 00 a8 01 74 12 48 89 df e8 8b 27 14 e1 
b8 f7 ff ff ff e9 b7 00 00 00 8a 43 12 a8 0b 74 1c 48 8b 83 a8 04 00 00 <48> 8b 
80 e0 04 00 00 65 ff 08 48 c7 83 a8 04 00 00 00 00 00 00
[  570.144601] RIP  [] pppoe_release+0x50/0x101 [pppoe]
[  570.144601]  RSP 
[  570.144601] CR2: 04e0
[  570.200518] ---[ end trace 46956baf17349563 ]---

pppoe_flush_dev() has no reason to override sk->sk_state with
PPPOX_ZOMBIE. pppox_unbind_sock() already sets sk->sk_state to
PPPOX_DEAD, which is the correct state given that sk is unbound and
po->pppoe_dev is NULL.

Fixes: 2b018d57ff18 ("pppoe: drop PPPOX_ZOMBIEs in pppoe_release")
Tested-by: Oleksii Berezhniak 
Signed-off-by: Guillaume Nault 
---
 drivers/net/ppp/pppoe.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/ppp/pppoe.c b/drivers/net/ppp/pppoe.c
index 3837ae3..2ed7506 100644
--- a/drivers/net/ppp/pppoe.c
+++ b/drivers/net/ppp/pppoe.c
@@ -313,7 +313,6 @@ static void pppoe_flush_dev(struct net_device *dev)
if (po->pppoe_dev == dev &&
sk->sk_state & (PPPOX_CONNECTED | PPPOX_BOUND | 
PPPOX_ZOMBIE)) {
pppox_unbind_sock(sk);
-   sk->sk_state = PPPOX_ZOMBIE;
sk->sk_state_change(sk);
po->pppoe_dev = NULL;
dev_put(dev);
-- 
2.5.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: xfrm4_garbage_collect reaching limit

2015-09-30 Thread Steffen Klassert

On Mon, Sep 21, 2015 at 10:51:11AM -0400, Dan Streetman wrote:
> On Fri, Sep 18, 2015 at 1:00 AM, Dan Streetman  wrote:
> > On Wed, Sep 16, 2015 at 4:45 AM, Steffen Klassert
> >  wrote:
> >>
> >> What about the patch below? With this we are independent of the number
> >> of cpus. It should cover most, if not all usecases.
> >
> > yep that works, thanks!  I'll give it a test also, but I don't see how
> > it would fail.
> 
> Yep, on a test setup that previously failed within several hours, it
> ran over the weekend successfully.  Thanks!
> 
> Tested-by: Dan Streetman 
> 
> >
> >>
> >> While we are at it, we could think about increasing the flowcache
> >> percpu limit. This value was choosen back in 2003, so maybe we could
> >> have more than 4k cache entries per cpu these days.
> >>
> >>
> >> Subject: [PATCH RFC] xfrm: Let the flowcache handle its size by default.
> >>
> >> The xfrm flowcache size is limited by the flowcache limit
> >> (4096 * number of online cpus) and the xfrm garbage collector
> >> threshold (2 * 32768), whatever is reached first. This means
> >> that we can hit the garbage collector limit only on systems
> >> with more than 16 cpus. On such systems we simply refuse
> >> new allocations if we reach the limit, so new flows are dropped.
> >> On syslems with 16 or less cpus, we hit the flowcache limit.
> >> In this case, we shrink the flow cache instead of refusing new
> >> flows.
> >>
> >> We increase the xfrm garbage collector threshold to INT_MAX
> >> to get the same behaviour, independent of the number of cpus.
> >>
> >> The xfrm garbage collector threshold can still be set below
> >> the flowcache limit to reduce the memory usage of the flowcache.
> >>
> >> Signed-off-by: Steffen Klassert 

I've applied this to ipsec-next now. It can be considered as a fix too,
but we still can tweak the value via the sysctl in the meantime. So
it is better to test it a bit longer before it hits the mainline.

Thanks a lot for your work Dan!
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 net-next 6/6] net: switchdev: extract struct switchdev_obj_*

2015-09-30 Thread Jiri Pirko

Tue, Sep 29, 2015 at 06:07:18PM CEST, vivien.dide...@savoirfairelinux.com wrote:
>Now that switchdev and its drivers directly use specific switchdev_obj_*
>structures, move them out of the switchdev_obj union and get rif of this
>outer structure.
>
>Signed-off-by: Vivien Didelot 
>---
> include/net/switchdev.h | 53 -
> 1 file changed, 26 insertions(+), 27 deletions(-)
>
>diff --git a/include/net/switchdev.h b/include/net/switchdev.h
>index bcadac3..e11425e 100644
>--- a/include/net/switchdev.h
>+++ b/include/net/switchdev.h
>@@ -64,30 +64,29 @@ enum switchdev_obj_id {
>   SWITCHDEV_OBJ_PORT_FDB,
> };
> 
>-struct switchdev_obj {
>-  enum switchdev_obj_id id;
>-  int (*cb)(struct switchdev_obj *obj);
>-  union {
>-  struct switchdev_obj_vlan { /* PORT_VLAN */
>-  u16 flags;
>-  u16 vid_begin;
>-  u16 vid_end;
>-  } vlan;
>-  struct switchdev_obj_ipv4_fib { /* IPV4_FIB */
>-  u32 dst;
>-  int dst_len;
>-  struct fib_info *fi;
>-  u8 tos;
>-  u8 type;
>-  u32 nlflags;
>-  u32 tb_id;
>-  } ipv4_fib;
>-  struct switchdev_obj_fdb {  /* PORT_FDB */
>-  const unsigned char *addr;
>-  u16 vid;
>-  u16 ndm_state;
>-  } fdb;
>-  } u;
>+/* SWITCHDEV_OBJ_PORT_VLAN */
>+struct switchdev_obj_vlan {
>+  u16 flags;
>+  u16 vid_begin;
>+  u16 vid_end;
>+};
>+
>+/* SWITCHDEV_OBJ_IPV4_FIB */
>+struct switchdev_obj_ipv4_fib {
>+  u32 dst;
>+  int dst_len;
>+  struct fib_info *fi;
>+  u8 tos;
>+  u8 type;
>+  u32 nlflags;
>+  u32 tb_id;
>+};
>+
>+/* SWITCHDEV_OBJ_PORT_FDB */
>+struct switchdev_obj_fdb {
>+  const unsigned char *addr;
>+  u16 vid;
>+  u16 ndm_state;
> };


I don't like these structs being passed down as a "void *". I think that
we should have some "common" struct for these objects, event if it would
be empty and pass it down. "void *" does not look good at all, does not
tell the reader what that param is about. How about:

struct switchdev_obj {
};

struct switchdev_obj_vlan {
struct switchdev_obj obj;
u16 flags;
u16 vid_begin;
u16 vid_end;
};
#define SWITCHDEV_OBJ_VLAN(obj) \
container_of(obj, struct switchdev_obj_vlan, obj)

/* SWITCHDEV_OBJ_IPV4_FIB */
struct switchdev_obj_ipv4_fib {
struct switchdev_obj obj;
u32 dst;
int dst_len;
struct fib_info *fi;
u8 tos;
u8 type;
u32 nlflags;
u32 tb_id;
};
#define SWITCHDEV_OBJ_IPV4_FIB(obj) \
container_of(obj, struct switchdev_obj_ipv4_fib, obj)

/* SWITCHDEV_OBJ_PORT_FDB */
struct switchdev_obj_fdb {
struct switchdev_obj obj;
const unsigned char *addr;
u16 vid;
u16 ndm_state;
};
#define SWITCHDEV_OBJ_FDB(obj)  \
container_of(obj, struct switchdev_obj_fdb, obj)
 

then pass struct switchdev_obj *obj down to drivers and in driver, get
original object by SWITCHDEV_OBJ_* ?

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 net-next 4/6] net: switchdev: pass callback to dump operation

2015-09-30 Thread Jiri Pirko

Tue, Sep 29, 2015 at 06:07:16PM CEST, vivien.dide...@savoirfairelinux.com wrote:
>Similar to the notifier_call callback of a notifier_block, change the
>function signature of switchdev dump operation to:
>
>int switchdev_port_obj_dump(struct net_device *dev,
>enum switchdev_obj_id id, void *obj,
>int (*cb)(void *obj));
>
>This allows the caller to pass and expect back a specific
>switchdev_obj_* structure instead of the generic switchdev_obj one.
>
>Drivers implementation of dump operation can now expect this specific
>structure and call the callback with it. Drivers have been changed
>accordingly.
>
>Signed-off-by: Vivien Didelot 
>---
> drivers/net/ethernet/rocker/rocker.c | 21 +
> include/net/switchdev.h  |  9 +---
> net/dsa/slave.c  | 26 +++--
> net/switchdev/switchdev.c| 45 ++--
> 4 files changed, 53 insertions(+), 48 deletions(-)
>
>diff --git a/drivers/net/ethernet/rocker/rocker.c 
>b/drivers/net/ethernet/rocker/rocker.c
>index 78fd443..107adb6 100644
>--- a/drivers/net/ethernet/rocker/rocker.c
>+++ b/drivers/net/ethernet/rocker/rocker.c
>@@ -4538,10 +4538,10 @@ static int rocker_port_obj_del(struct net_device *dev,
> }
> 
> static int rocker_port_fdb_dump(const struct rocker_port *rocker_port,
>-  struct switchdev_obj *obj)
>+  struct switchdev_obj_fdb *fdb,
>+  int (*cb)(void *obj))

 we should have some
typedef for this.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 net-next 0/6] net: switchdev: use specific switchdev_obj_*

2015-09-30 Thread Jiri Pirko

Tue, Sep 29, 2015 at 06:07:12PM CEST, vivien.dide...@savoirfairelinux.com wrote:
>This patchset changes switchdev add, del, dump operations from this:
>
>int (*switchdev_port_obj_add)(struct net_device *dev,
>  struct switchdev_obj *obj,
>  struct switchdev_trans *trans);
>int (*switchdev_port_obj_del)(struct net_device *dev,
>  struct switchdev_obj *obj);
>int (*switchdev_port_obj_dump)(struct net_device *dev,
>  struct switchdev_obj *obj);
>
>to something similar to the notifier_call callback of a notifier_block:
>
>int (*switchdev_port_obj_add)(struct net_device *dev,
>  enum switchdev_obj_id id,
>  const void *obj,
>  struct switchdev_trans *trans);  
>
>int (*switchdev_port_obj_del)(struct net_device *dev,
>  enum switchdev_obj_id id,
>  const void *obj);
>int (*switchdev_port_obj_dump)(struct net_device *dev,
>   enum switchdev_obj_id id, void *obj,
>   int (*cb)(void *obj));
>
>This allows the caller to pass and expect back a specific switchdev_obj_*
>structure (e.g. switchdev_obj_fdb) instead of the generic switchdev_obj one.
>
>This will simplify pushing the callback function down to the drivers.
>
>The first 3 patches get rid of the dev parameter of the dump callback, since it
>is not always neeeded (e.g. vlan_dump) and some drivers (such as DSA drivers)
>may not have easy access to it.
>
>Patches 4 and 5 implement the change in the switchdev operations and its users.
>
>Patch 6 extracts the inner switchdev_obj_* structures from switchdev_obj and
>removes this last one.


How about attrs? We should keep objs and attrs api consistent.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v5 01/22] net/xen-netback: xenvif_gop_frag_copy: move GSO check out of the loop

2015-09-30 Thread Julien Grall

The skb doesn't change within the function. Therefore it's only
necessary to check if we need GSO once at the beginning.

Signed-off-by: Julien Grall 
Acked-by: Wei Liu 

---
Cc: Ian Campbell 
Cc: netdev@vger.kernel.org

Changes in v4:
- Add Wei's acked

Changes in v2:
- Patch added
---
 drivers/net/xen-netback/netback.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/net/xen-netback/netback.c 
b/drivers/net/xen-netback/netback.c
index ec98d43..c4e6c02 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -288,6 +288,13 @@ static void xenvif_gop_frag_copy(struct xenvif_queue 
*queue, struct sk_buff *skb
unsigned long bytes;
int gso_type = XEN_NETIF_GSO_TYPE_NONE;
 
+   if (skb_is_gso(skb)) {
+   if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV4)
+   gso_type = XEN_NETIF_GSO_TYPE_TCPV4;
+   else if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV6)
+   gso_type = XEN_NETIF_GSO_TYPE_TCPV6;
+   }
+
/* Data must not cross a page boundary. */
BUG_ON(size + offset > PAGE_SIZEgso_type & SKB_GSO_TCPV6)
-   gso_type = XEN_NETIF_GSO_TYPE_TCPV6;
-   }
-
if (*head && ((1 << gso_type) & queue->vif->gso_mask))
queue->rx.req_cons++;
 
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] amd-xgbe: fix potential memory leak in xgbe-debugfs

2015-09-30 Thread Geliang Tang

Added kfree() to avoid the memory leak when debugfs_create_dir() fails.

Signed-off-by: Geliang Tang 
---
 drivers/net/ethernet/amd/xgbe/xgbe-debugfs.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-debugfs.c 
b/drivers/net/ethernet/amd/xgbe/xgbe-debugfs.c
index 2c063b6..66137ff 100644
--- a/drivers/net/ethernet/amd/xgbe/xgbe-debugfs.c
+++ b/drivers/net/ethernet/amd/xgbe/xgbe-debugfs.c
@@ -330,6 +330,7 @@ void xgbe_debugfs_init(struct xgbe_prv_data *pdata)
pdata->xgbe_debugfs = debugfs_create_dir(buf, NULL);
if (!pdata->xgbe_debugfs) {
netdev_err(pdata->netdev, "debugfs_create_dir failed\n");
+   kfree(buf);
return;
}
 
-- 
1.9.1


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v5 17/22] net/xen-netfront: Make it running on 64KB page granularity

2015-09-30 Thread Julien Grall

The PV network protocol is using 4KB page granularity. The goal of this
patch is to allow a Linux using 64KB page granularity using network
device on a non-modified Xen.

It's only necessary to adapt the ring size and break skb data in small
chunk of 4KB. The rest of the code is relying on the grant table code.

Note that we allocate a Linux page for each rx skb but only the first
4KB is used. We may improve the memory usage by extending the size of
the rx skb.

Signed-off-by: Julien Grall 
Reviewed-by: David Vrabel 

---
Cc: Konrad Rzeszutek Wilk 
Cc: Boris Ostrovsky 
Cc: netdev@vger.kernel.org

Improvement such as support of 64KB grant is not taken into
consideration in this patch because we have the requirement to run a Linux
using 64KB pages on a non-modified Xen.

Tested with workload such as ping, ssh, wget, git... I would happy if
someone give details how to test all the path.

Changes in v4:
- s/gnttab_one_grant/gnttab_for_one_grant/ based on the new naming
- Add David's reviewed-by

Changes in v3:
- Fix errors reported by checkpatch.pl
- s/mfn/gfn/ base on the new naming
- xennet_tx_setup_grant was calling itself resulting an
guest stall when using iperf.
- The grant callback doesn't allow anymore to change the len
(wasn't used here)
- gnttab_foreach_grant has been renamed to gnttab_foreach_grant_in_range
- gnttab_page_grant_foreign_ref has been renamed to
gnttab_foreach_grant_foreign_ref_one

Changes in v2:
- Use gnttab_foreach_grant to split a Linux page in grant
- Fix count slots
---
 drivers/net/xen-netfront.c | 122 -
 1 file changed, 86 insertions(+), 36 deletions(-)

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index f821a97..badca31 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -74,8 +74,8 @@ struct netfront_cb {
 
 #define GRANT_INVALID_REF  0
 
-#define NET_TX_RING_SIZE __CONST_RING_SIZE(xen_netif_tx, PAGE_SIZE)
-#define NET_RX_RING_SIZE __CONST_RING_SIZE(xen_netif_rx, PAGE_SIZE)
+#define NET_TX_RING_SIZE __CONST_RING_SIZE(xen_netif_tx, XEN_PAGE_SIZE)
+#define NET_RX_RING_SIZE __CONST_RING_SIZE(xen_netif_rx, XEN_PAGE_SIZE)
 
 /* Minimum number of Rx slots (includes slot for GSO metadata). */
 #define NET_RX_SLOTS_MIN (XEN_NETIF_NR_SLOTS_MIN + 1)
@@ -291,7 +291,7 @@ static void xennet_alloc_rx_buffers(struct netfront_queue 
*queue)
struct sk_buff *skb;
unsigned short id;
grant_ref_t ref;
-   unsigned long gfn;
+   struct page *page;
struct xen_netif_rx_request *req;
 
skb = xennet_alloc_one_rx_buffer(queue);
@@ -307,14 +307,13 @@ static void xennet_alloc_rx_buffers(struct netfront_queue 
*queue)
BUG_ON((signed short)ref < 0);
queue->grant_rx_ref[id] = ref;
 
-   gfn = 
xen_page_to_gfn(skb_frag_page(&skb_shinfo(skb)->frags[0]));
+   page = skb_frag_page(&skb_shinfo(skb)->frags[0]);
 
req = RING_GET_REQUEST(&queue->rx, req_prod);
-   gnttab_grant_foreign_access_ref(ref,
-   queue->info->xbdev->otherend_id,
-   gfn,
-   0);
-
+   gnttab_page_grant_foreign_access_ref_one(ref,
+
queue->info->xbdev->otherend_id,
+page,
+0);
req->id = id;
req->gref = ref;
}
@@ -415,25 +414,33 @@ static void xennet_tx_buf_gc(struct netfront_queue *queue)
xennet_maybe_wake_tx(queue);
 }
 
-static struct xen_netif_tx_request *xennet_make_one_txreq(
-   struct netfront_queue *queue, struct sk_buff *skb,
-   struct page *page, unsigned int offset, unsigned int len)
+struct xennet_gnttab_make_txreq {
+   struct netfront_queue *queue;
+   struct sk_buff *skb;
+   struct page *page;
+   struct xen_netif_tx_request *tx; /* Last request */
+   unsigned int size;
+};
+
+static void xennet_tx_setup_grant(unsigned long gfn, unsigned int offset,
+ unsigned int len, void *data)
 {
+   struct xennet_gnttab_make_txreq *info = data;
unsigned int id;
struct xen_netif_tx_request *tx;
grant_ref_t ref;
-
-   len = min_t(unsigned int, PAGE_SIZE - offset, len);
+   /* convenient aliases */
+   struct page *page = info->page;
+   struct netfront_queue *queue = info->queue;
+   struct sk_buff *skb = info->skb;
 
id = get_id_from_freelist(&queue->tx_skb_freelist, queue->tx_skbs);
tx = RING_GET_REQUEST(&queue->tx, queue->tx.req_prod_pvt++);
ref = gnttab_claim_

[PATCH v5 18/22] net/xen-netback: Make it running on 64KB page granularity

2015-09-30 Thread Julien Grall

The PV network protocol is using 4KB page granularity. The goal of this
patch is to allow a Linux using 64KB page granularity working as a
network backend on a non-modified Xen.

It's only necessary to adapt the ring size and break skb data in small
chunk of 4KB. The rest of the code is relying on the grant table code.

Signed-off-by: Julien Grall 
Reviewed-by: Wei Liu 

---
Cc: Ian Campbell 
Cc: netdev@vger.kernel.org

Improvement such as support of 64KB grant is not taken into
consideration in this patch because we have the requirement to run a
Linux using 64KB pages on a non-modified Xen.

Note that I haven't add a comment why the offset is 0 after the first
iteration. See [1] for more details.

[1] https://lkml.org/lkml/2015/8/10/456

Wei, I have kept your reviewed-by because the conflict during rebase
with with 1d5d48523900a4b0f25d6b52f1a93c84bd671186
"xen-netback: require fewer guest Rx slots when not using GSO" was
trivial to resolve. It's basically drop XEN_NETBK_RX_SLOTS_MAX and
replace the new PAGE_SIZE by XEN_PAGE_SIZE. Let me know if it's not
okay.

Changes in v5:
- Add Wei's reviewed-by
- Fix conflict with 1d5d48523900a4b0f25d6b52f1a93c84bd671186

Changes in v4:
- Add a comment to explain how we compute MAX_XEN_SKB_FRAGS

Changes in v3:
- Fix errors reported by checkpatch.pl
- s/mfn/gfn/ based on the new naming
- gnttab_foreach_grant has been renamed to gnttab_forach_grant_in_range
- The grant callback doesn't allow anymore to use less data. An
helpers has been added in netback to handle this.

Changes in v2:
- Correctly set MAX_GRANT_COPY_OPS and XEN_NETBK_RX_SLOTS_MAX
- Don't use XEN_PAGE_SIZE in handle_frag_list as we coalesce
fragment into a new skb
- Use gnntab_foreach_grant to split a Linux page into grant
---
 drivers/net/xen-netback/common.h  |  16 ++--
 drivers/net/xen-netback/netback.c | 157 --
 2 files changed, 111 insertions(+), 62 deletions(-)

diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
index a7bf747..0333ab0 100644
--- a/drivers/net/xen-netback/common.h
+++ b/drivers/net/xen-netback/common.h
@@ -44,6 +44,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 typedef unsigned int pending_ring_idx_t;
@@ -64,8 +65,8 @@ struct pending_tx_info {
struct ubuf_info callback_struct;
 };
 
-#define XEN_NETIF_TX_RING_SIZE __CONST_RING_SIZE(xen_netif_tx, PAGE_SIZE)
-#define XEN_NETIF_RX_RING_SIZE __CONST_RING_SIZE(xen_netif_rx, PAGE_SIZE)
+#define XEN_NETIF_TX_RING_SIZE __CONST_RING_SIZE(xen_netif_tx, XEN_PAGE_SIZE)
+#define XEN_NETIF_RX_RING_SIZE __CONST_RING_SIZE(xen_netif_rx, XEN_PAGE_SIZE)
 
 struct xenvif_rx_meta {
int id;
@@ -80,16 +81,21 @@ struct xenvif_rx_meta {
 /* Discriminate from any valid pending_idx value. */
 #define INVALID_PENDING_IDX 0x
 
-#define MAX_BUFFER_OFFSET PAGE_SIZE
+#define MAX_BUFFER_OFFSET XEN_PAGE_SIZE
 
 #define MAX_PENDING_REQS XEN_NETIF_TX_RING_SIZE
 
+/* The maximum number of frags is derived from the size of a grant (same
+ * as a Xen page size for now).
+ */
+#define MAX_XEN_SKB_FRAGS (65536 / XEN_PAGE_SIZE + 1)
+
 /* It's possible for an skb to have a maximal number of frags
  * but still be less than MAX_BUFFER_OFFSET in size. Thus the
- * worst-case number of copy operations is MAX_SKB_FRAGS per
+ * worst-case number of copy operations is MAX_XEN_SKB_FRAGS per
  * ring slot.
  */
-#define MAX_GRANT_COPY_OPS (MAX_SKB_FRAGS * XEN_NETIF_RX_RING_SIZE)
+#define MAX_GRANT_COPY_OPS (MAX_XEN_SKB_FRAGS * XEN_NETIF_RX_RING_SIZE)
 
 #define NETBACK_INVALID_HANDLE -1
 
diff --git a/drivers/net/xen-netback/netback.c 
b/drivers/net/xen-netback/netback.c
index c4e6c02..e481f37 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -152,9 +152,9 @@ static inline pending_ring_idx_t pending_index(unsigned i)
 static int xenvif_rx_ring_slots_needed(struct xenvif *vif)
 {
if (vif->gso_mask)
-   return DIV_ROUND_UP(vif->dev->gso_max_size, PAGE_SIZE) + 1;
+   return DIV_ROUND_UP(vif->dev->gso_max_size, XEN_PAGE_SIZE) + 1;
else
-   return DIV_ROUND_UP(vif->dev->mtu, PAGE_SIZE);
+   return DIV_ROUND_UP(vif->dev->mtu, XEN_PAGE_SIZE);
 }
 
 static bool xenvif_rx_ring_slots_available(struct xenvif_queue *queue)
@@ -274,6 +274,80 @@ static struct xenvif_rx_meta *get_next_rx_buffer(struct 
xenvif_queue *queue,
return meta;
 }
 
+struct gop_frag_copy {
+   struct xenvif_queue *queue;
+   struct netrx_pending_operations *npo;
+   struct xenvif_rx_meta *meta;
+   int head;
+   int gso_type;
+
+   struct page *page;
+};
+
+static void xenvif_setup_copy_gop(unsigned long gfn,
+ unsigned int offset,
+ unsigned int *len,
+ struct

Re: List corruption on epoll_ctl(EPOLL_CTL_DEL) an AF_UNIX socket

2015-09-30 Thread Rainer Weikusat

Mathias Krause  writes:
> On 29 September 2015 at 21:09, Jason Baron  wrote:
>> However, if we call connect on socket 's', to connect to a new socket 'o2', 
>> we
>> drop the reference on the original socket 'o'. Thus, we can now close socket
>> 'o' without unregistering from epoll. Then, when we either close the ep
>> or unregister 'o', we end up with this list corruption. Thus, this is not a
>> race per se, but can be triggered sequentially.
>
> Sounds profound, but the reproducers calls connect only once per
> socket. So there is no "connect to a new socket", no?
> But w/e, see below.

In case you want some information on this: This is a kernel warning I
could trigger (more than once) on the single day I could so far spend
looking into this (3.2.54 kernel):

Sep 15 19:37:19 doppelsaurus kernel: WARNING: at lib/list_debug.c:53 
list_del+0x9/0x30()
Sep 15 19:37:19 doppelsaurus kernel: Hardware name: 500-330nam
Sep 15 19:37:19 doppelsaurus kernel: list_del corruption. prev->next should be 
88022c38f078, but was dead00100100
Sep 15 19:37:19 doppelsaurus kernel: Modules linked in: snd_hrtimer binfmt_misc 
af_packet nf_conntrack loop snd_hda_codec_hdmi snd_hda_codec_idt snd_hda_intel 
snd_hda_codec snd_hwdep snd_pcm_oss snd_mixer_oss snd_pcm sg snd_page_alloc 
snd_seq_du
mmy sr_mod snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_timer 
ath9k snd cdrom ath9k_common ath9k_hw r8169 mii ath usb_storage unix
Sep 15 19:37:19 doppelsaurus kernel: Pid: 3340, comm: a.out Tainted: GW 
   3.2.54-saurus-vesa #9
Sep 15 19:37:19 doppelsaurus kernel: Call Trace:
Sep 15 19:37:19 doppelsaurus kernel: [] ? 
__list_del_entry+0x80/0xc0
Sep 15 19:37:19 doppelsaurus kernel: [] ? 
warn_slowpath_common+0x79/0xc0
Sep 15 19:37:19 doppelsaurus kernel: [] ? 
warn_slowpath_fmt+0x45/0x50
Sep 15 19:37:19 doppelsaurus kernel: [] ? list_del+0x9/0x30
Sep 15 19:37:19 doppelsaurus kernel: [] ? 
remove_wait_queue+0x29/0x50
Sep 15 19:37:19 doppelsaurus kernel: [] ? 
ep_unregister_pollwait.isra.9+0x32/0x50
Sep 15 19:37:19 doppelsaurus kernel: [] ? ep_remove+0x2a/0xc0
Sep 15 19:37:19 doppelsaurus kernel: [] ? 
eventpoll_release_file+0x5e/0x90
Sep 15 19:37:19 doppelsaurus kernel: [] ? fput+0x1c6/0x220
Sep 15 19:37:19 doppelsaurus kernel: [] ? filp_close+0x5f/0x90
Sep 15 19:37:19 doppelsaurus kernel: [] ? sys_close+0x86/0xd0
Sep 15 19:37:19 doppelsaurus kernel: [] ? 
system_call_fastpath+0x16/0x1b

The dead00100100 is one of the list poison values a linkage pointer
is set to during/ after removal from a list. The particular warning
means that entry->prev (the item being removed) pointed to another entry
whose next pointer was not the address of entry but
dead00100100. Most likely, this means there's a list insert racing
with a list remove somewhere here where the insert picks up the pointer
to the previous item while it is still on the list and uses it while the
delete removes it, with delete having the last word and thus setting
prev->next to dead00100100 after the insert set it to the address of
the item to be inserted.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] net ipv4: use preferred log methods

2015-09-30 Thread Bastian Stender


Hi,

On 09/30/2015 11:20 AM, Bastian Stender wrote:

Replace printk calls with preferred unconditional log method calls to keep
kernel messages clean.

Signed-off-by: Bastian Stender 
---
  net/ipv4/ipconfig.c| 53 +-
  net/ipv4/netfilter/arp_tables.c| 17 +
  net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c |  2 +-
  net/ipv4/netfilter/nf_nat_snmp_basic.c | 31 ---
  4 files changed, 36 insertions(+), 67 deletions(-)


Please ignore my previous patch. I'll test it again and resubmit it.

Thanks for your patience.

Regards,
Bastian Stender
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

pull-request: can 2015-09-30

2015-09-30 Thread Marc Kleine-Budde

Hello David,

this is a pull request of a single patch for 4.3.

The patch is by Stephane Grosjean and add support for the peak OEM PCI card to
the peak_pci driver by adding its device ID.

regards,
Marc
---

The following changes since commit b84f78782052ee4516903e5d0566a5eee365b771:

  net: Initialize flow flags in input path (2015-09-29 21:52:32 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can.git 
tags/linux-can-fixes-for-4.3-20150930

for you to fetch changes up to 7253054e5d05233063c48f57ac02283bd35753d8:

  can: peak_pci: add unused device id. in devices table (2015-09-30 12:57:58 
+0200)


linux-can-fixes-for-4.3-20150930


Stephane Grosjean (1):
  can: peak_pci: add unused device id. in devices table

 drivers/net/can/sja1000/peak_pci.c | 1 +
 1 file changed, 1 insertion(+)

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] can: peak_pci: add unused device id. in devices table

2015-09-30 Thread Marc Kleine-Budde

From: Stephane Grosjean 

While new PEAK_PCIE_OEM_ID has been defined since 3.17, no corresponding
entry has been added in the peak_pci_tbl[] of the peak_pci CAN driver.

This patch enables now users of the PCAN-PCI Express OEM card to run the
peak_pci driver too.

Signed-off-by: Stephane Grosjean 
Signed-off-by: Marc Kleine-Budde 
---
 drivers/net/can/sja1000/peak_pci.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/can/sja1000/peak_pci.c 
b/drivers/net/can/sja1000/peak_pci.c
index e5fac368068a..131026fbc2d7 100644
--- a/drivers/net/can/sja1000/peak_pci.c
+++ b/drivers/net/can/sja1000/peak_pci.c
@@ -87,6 +87,7 @@ static const struct pci_device_id peak_pci_tbl[] = {
{PEAK_PCI_VENDOR_ID, PEAK_PC_104P_DEVICE_ID, PCI_ANY_ID, PCI_ANY_ID,},
{PEAK_PCI_VENDOR_ID, PEAK_PCI_104E_DEVICE_ID, PCI_ANY_ID, PCI_ANY_ID,},
{PEAK_PCI_VENDOR_ID, PEAK_CPCI_DEVICE_ID, PCI_ANY_ID, PCI_ANY_ID,},
+   {PEAK_PCI_VENDOR_ID, PEAK_PCIE_OEM_ID, PCI_ANY_ID, PCI_ANY_ID,},
 #ifdef CONFIG_CAN_PEAK_PCIEC
{PEAK_PCI_VENDOR_ID, PEAK_PCIEC_DEVICE_ID, PCI_ANY_ID, PCI_ANY_ID,},
{PEAK_PCI_VENDOR_ID, PEAK_PCIEC34_DEVICE_ID, PCI_ANY_ID, PCI_ANY_ID,},
-- 
2.6.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH bluetooth-next 0/4] ieee802154: add llsec support over nl802154

2015-09-30 Thread Marcel Holtmann

Hi Alex,

> this patch series will add llsec support for nl802154.
> 
> What is "llsec"?
> 
> The llsec (I suppose it stands for linklayer security) is part of the SoftMAC
> implementation of 802.15.4 "net/mac802154/llsec.c". The 802.15.4 standard
> describes an security mechanism over ACL's. The encryption/decryption will do
> llsec. To access llsec we need an interface for nl802154. The 802.15.4 
> standard
> describes PHY/MAC layer and we have "possible" similar paradigms like wireless
> with SoftMAC and HardMAC drivers. (We don't support HardMAC transceivers right
> now, I never had some HardMAC transceivers, are really expensive and there are
> only few some which can also run in a "raw" mode.) Anyway the nl802154 should
> access SoftMAC/HardMAC drivers to abstract "one interface to userspace".
> 
> These ACL's are known as "security tables" inside the mac information base
> (MIB) of 802.15.4 standard, security MIB.
> 
> The final goal we have to provide these tables in userspace is an "iptables"
> handling "store" and "restore", over the userspace application "iwpan" which
> contains the general "framework mechanism" like wireless "iw" tool, you can
> add/del entries on these security tables, then.
> 
> I don't looked right now how iptables userspace application do "exactly" the
> store and restore mechanism. The current way is a very KISS handling:
> 
> We add netlink cmd's to add/del the table entries. Over the dump callback
> it's possible to get all information which is printed out as the command
> line string "iwpan dev $WPAN_DEV $TABLE add ...". The restore script will
> simple export $WPAN_DEV variable to restore these configuration for a
> specific interface.
> 
> I will send the userspace patches as well to netdev, maybe somebody wants
> to know what I did there for first support.
> 
> This sounds weird but is to support llsec somehow a acceptable use-case. The
> final goal is to lookup how iptables works and make a nicer C implementation.
> There is currently no "official supported" userspace tool which support
> accessing the "llsec".
> 
> I added several TODO's to the current implementation and added a new:
> 
> CONFIG_IEEE802154_NL802154_EXPERIMENTAL
> 
> This config will not build the nl802154 llsec layer and reduce the MAX_ATTR
> attribute of nl802154 interface. With this config I explicit say this 
> interface
> over nl802154 is still in development and will be changed later.
> 
> The 802.15.4 subsystem is still in EXPERIMENTAL state, there was some commit
> f4671a90c418b5aae14b61a9fc9d79c629403ca0 ("net/ieee802154: remove depends on
> CONFIG_EXPERIMENTAL") which is fine but no maintainer ever said it's not
> experimental anymore.
> 
> Checkpatch will complain about some above 80-chars width, at these places I
> ignore these warning otherwise the code looks awful in my opinion.
> 
> My current working repository is still bluetooth-next/master. David if
> everything is fine, then please ack patch "[PATCH bluetooth-next 1/4]
> netlink: add nla_get for le32 and le64", so Marcel can apply it. Thanks.
> 
> - Alex
> 
> Alexander Aring (4):
>  netlink: add nla_get for le32 and le64
>  nl802154: use nla_get_le64 for get extended addr
>  nl802154: add support for security layer
>  mac802154: add comments for llsec issues
> 
> include/net/cfg802154.h |  131 
> include/net/ieee802154_netdev.h |   75 ---
> include/net/netlink.h   |   18 +
> include/net/nl802154.h  |  191 ++
> net/ieee802154/Kconfig  |5 +
> net/ieee802154/core.c   |   12 +
> net/ieee802154/core.h   |1 +
> net/ieee802154/nl802154.c   | 1320 ---
> net/ieee802154/rdev-ops.h   |  109 
> net/mac802154/cfg.c |  205 ++
> net/mac802154/rx.c  |4 +
> net/mac802154/tx.c  |4 +
> 12 files changed, 1903 insertions(+), 172 deletions(-)

all 4 patches have been applied to bluetooth-next tree.

Regards

Marcel

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 09/12] ipv6: use ktime_t for internal timestamps

2015-09-30 Thread Arnd Bergmann

The ipv6 mip6 implementation is one of only a few users of the
skb_get_timestamp() function in the kernel, which is both unsafe
on 32-bit architectures because of the 2038 overflow, and slightly
less efficient than the skb_get_ktime() based approach.

This converts the function call and the mip6_report_rate_limiter
structure that stores the time stamp, eliminating all uses of
timeval in the ipv6 code.

Signed-off-by: Arnd Bergmann 
Cc: Alexey Kuznetsov 
Cc: James Morris 
Cc: Hideaki YOSHIFUJI 
Cc: Patrick McHardy 
---
 net/ipv6/mip6.c | 16 +++-
 1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/net/ipv6/mip6.c b/net/ipv6/mip6.c
index b9779d441b12..60c79a08e14a 100644
--- a/net/ipv6/mip6.c
+++ b/net/ipv6/mip6.c
@@ -118,7 +118,7 @@ static int mip6_mh_filter(struct sock *sk, struct sk_buff 
*skb)
 
 struct mip6_report_rate_limiter {
spinlock_t lock;
-   struct timeval stamp;
+   ktime_t stamp;
int iif;
struct in6_addr src;
struct in6_addr dst;
@@ -184,20 +184,18 @@ static int mip6_destopt_output(struct xfrm_state *x, 
struct sk_buff *skb)
return 0;
 }
 
-static inline int mip6_report_rl_allow(struct timeval *stamp,
+static inline int mip6_report_rl_allow(ktime_t stamp,
   const struct in6_addr *dst,
   const struct in6_addr *src, int iif)
 {
int allow = 0;
 
spin_lock_bh(&mip6_report_rl.lock);
-   if (mip6_report_rl.stamp.tv_sec != stamp->tv_sec ||
-   mip6_report_rl.stamp.tv_usec != stamp->tv_usec ||
+   if (!ktime_equal(mip6_report_rl.stamp, stamp) ||
mip6_report_rl.iif != iif ||
!ipv6_addr_equal(&mip6_report_rl.src, src) ||
!ipv6_addr_equal(&mip6_report_rl.dst, dst)) {
-   mip6_report_rl.stamp.tv_sec = stamp->tv_sec;
-   mip6_report_rl.stamp.tv_usec = stamp->tv_usec;
+   mip6_report_rl.stamp = stamp;
mip6_report_rl.iif = iif;
mip6_report_rl.src = *src;
mip6_report_rl.dst = *dst;
@@ -216,7 +214,7 @@ static int mip6_destopt_reject(struct xfrm_state *x, struct 
sk_buff *skb,
struct ipv6_destopt_hao *hao = NULL;
struct xfrm_selector sel;
int offset;
-   struct timeval stamp;
+   ktime_t stamp;
int err = 0;
 
if (unlikely(fl6->flowi6_proto == IPPROTO_MH &&
@@ -230,9 +228,9 @@ static int mip6_destopt_reject(struct xfrm_state *x, struct 
sk_buff *skb,
(skb_network_header(skb) + offset);
}
 
-   skb_get_timestamp(skb, &stamp);
+   stamp = skb_get_ktime(skb);
 
-   if (!mip6_report_rl_allow(&stamp, &ipv6_hdr(skb)->daddr,
+   if (!mip6_report_rl_allow(stamp, &ipv6_hdr(skb)->daddr,
  hao ? &hao->addr : &ipv6_hdr(skb)->saddr,
  opt->iif))
goto out;
-- 
2.1.0.rc2

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 11/12] [RFC] ipv4: avoid timespec in timestamp computation

2015-09-30 Thread Arnd Bergmann

This is an attempt to avoid the use of timespec in ipv4, where
getnstimeofday() used to be used for computing the number of
milliseconds since midnight, in three places.

That computation would overflow in 2038 on 32-bit machines,
and the normal workaround for this is to use timespec64, which
in turn requires an expensive div_s64_mod() function call
for calculating the seconds modulo 86400.

Instead, this approach introduces a new generic helper function
that does this more efficiently, by using only a 32-bit modulo
(which the compiler can turn into two multiplications), relying
on 39 bits to be sufficient for the current time of day. This
is roughly 100 times faster than a full divmod operation on ARM.

As a further optimization, this does not use the exact nanosecond
value but instead relies tk_xtime() to report the time of the
last jiffy, which is slightly less accurate, depending on the
value of HZ.

Signed-off-by: Arnd Bergmann 
Cc: Alexey Kuznetsov 
Cc: James Morris 
Cc: Hideaki YOSHIFUJI 
Cc: Patrick McHardy 
Cc: John Stultz 
Cc: Thomas Gleixner 
---
 include/linux/timekeeping.h |  2 ++
 kernel/time/timekeeping.c   | 34 ++
 net/ipv4/icmp.c |  8 +++-
 net/ipv4/ip_options.c   |  9 ++---
 4 files changed, 41 insertions(+), 12 deletions(-)

diff --git a/include/linux/timekeeping.h b/include/linux/timekeeping.h
index ca2eaa9077eb..3e126f8c2876 100644
--- a/include/linux/timekeeping.h
+++ b/include/linux/timekeeping.h
@@ -44,6 +44,8 @@ extern void ktime_get_ts64(struct timespec64 *ts);
 extern time64_t ktime_get_seconds(void);
 extern time64_t ktime_get_real_seconds(void);
 
+extern u32 ktime_get_ms_since_midnight(void);
+
 extern int __getnstimeofday64(struct timespec64 *tv);
 extern void getnstimeofday64(struct timespec64 *tv);
 extern void getboottime64(struct timespec64 *ts);
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index ed5049ff94c5..3a1e030fa969 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -846,6 +846,40 @@ time64_t ktime_get_real_seconds(void)
 }
 EXPORT_SYMBOL_GPL(ktime_get_real_seconds);
 
+#if IS_ENABLED(CONFIG_INET)
+u32 ktime_get_ms_of_day(void)
+{
+   struct timekeeper *tk = &tk_core.timekeeper;
+   struct timespec64 now;
+   unsigned long seq;
+   u32 ms;
+
+   /* we assume that the coarse time is good enough here */
+   do {
+   seq = read_seqcount_begin(&tk_core.seq);
+
+   now = tk_xtime(tk);
+   } while (read_seqcount_retry(&tk_core.seq, seq));
+
+   /*
+* efficiently calculate the milliseconds since midnight:
+* 86400 seconds per day == 2^7 * 675, which helps us
+* replace an expensive div_s64_rem() with a hand-written
+* 39-bit modulo on 32-bit architectures.
+*/
+   if (!IS_ENABLED(CONFIG_64BIT))
+   ms = (now.tv_sec & 0x7f) * MSEC_PER_SEC +
+((u32)(now.tv_sec >> 7) % 675) * 0x80 * MSEC_PER_SEC;
+   else
+   ms = (now.tv_sec % 86400) * MSEC_PER_SEC;
+
+   ms += now.tv_nsec / NSEC_PER_MSEC;
+
+   return ms;
+}
+EXPORT_SYMBOL_GPL(ktime_get_ms_since_midnight);
+#endif
+
 #ifdef CONFIG_NTP_PPS
 
 /**
diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index e5eb8ac4089d..b1c53f2f7bd5 100644
--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -914,7 +914,6 @@ static bool icmp_echo(struct sk_buff *skb)
  */
 static bool icmp_timestamp(struct sk_buff *skb)
 {
-   struct timespec tv;
struct icmp_bxm icmp_param;
/*
 *  Too short.
@@ -923,11 +922,10 @@ static bool icmp_timestamp(struct sk_buff *skb)
goto out_err;
 
/*
-*  Fill in the current time as ms since midnight UT:
+*  Fill in the current time as ms since midnight UT,
+*  this could probably be done faster.
 */
-   getnstimeofday(&tv);
-   icmp_param.data.times[1] = htonl((tv.tv_sec % 86400) * MSEC_PER_SEC +
-tv.tv_nsec / NSEC_PER_MSEC);
+   icmp_param.data.times[1] = htonl(ktime_get_ms_since_midnight());
icmp_param.data.times[2] = icmp_param.data.times[1];
if (skb_copy_bits(skb, 0, &icmp_param.data.times[0], 4))
BUG();
diff --git a/net/ipv4/ip_options.c b/net/ipv4/ip_options.c
index bd246792360b..339ce528ecae 100644
--- a/net/ipv4/ip_options.c
+++ b/net/ipv4/ip_options.c
@@ -58,10 +58,8 @@ void ip_options_build(struct sk_buff *skb, struct ip_options 
*opt,
if (opt->ts_needaddr)
ip_rt_get_source(iph+opt->ts+iph[opt->ts+2]-9, skb, rt);
if (opt->ts_needtime) {
-   struct timespec tv;
__be32 midtime;
-   getnstimeofday(&tv);
-   midtime = htonl((tv.tv_sec % 86400) * MSEC_PER_SEC + 
tv.tv_nsec / NSEC_PER_MSEC);
+   midtime = htonl(ktime_get_ms_since_mi

[PATCH 07/12] atm: hide 'struct zatm_t_hist'

2015-09-30 Thread Arnd Bergmann

The zatm_t_hist structure is not used anywhere in the kernel, but is
exported to user space. As we are trying to eliminate uses of time_t
in the kernel for y2038 compatibility, the current definition triggers
checking tools because it contains 'struct timeval'.

We can work around this by adding '#ifdef __KERNEL__'. I could not find
out what the structure is actually used for, so this is the safe choice
in case there is some user space tool that relies on the definition.

If we are sure that nothing in user space relies on the structure, we
can instead remove the definition completely.

Signed-off-by: Arnd Bergmann 
Cc: Chas Williams <3ch...@gmail.com>
Cc: linux-atm-gene...@lists.sourceforge.net
---
 include/uapi/linux/atm_zatm.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/atm_zatm.h b/include/uapi/linux/atm_zatm.h
index 10f0fa29454f..2908ea86e6f2 100644
--- a/include/uapi/linux/atm_zatm.h
+++ b/include/uapi/linux/atm_zatm.h
@@ -35,11 +35,12 @@ struct zatm_pool_req {
struct zatm_pool_info info; /* actual information */
 };
 
+#ifndef __KERNEL__
 struct zatm_t_hist {
struct timeval real;/* real (wall-clock) time */
struct timeval expected;/* expected real time */
 };
-
+#endif
 
 #define ZATM_OAM_POOL  0   /* free buffer pool for OAM cells */
 #define ZATM_AAL0_POOL 1   /* free buffer pool for AAL0 cells */
-- 
2.1.0.rc2

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 12/12] [RFC] can: avoid using timeval for uapi

2015-09-30 Thread Arnd Bergmann

The can subsystem communicates with user space using a bcm_msg_head
header, which contains two timestamps. This is problematic for
multiple reasons:

a) The structure layout is currently incompatible between 64-bit
   user space and 32-bit user space, and cannot work in compat
   mode (other than x32).

b) The timeval structure layout will change in 32-bit user
   space when we fix the y2038 overflow problem by redefining
   time_t to 64-bit, making new 32-bit user space incompatible
   with the current kernel interface.
   Cars last a long time and often use old kernels, so the actual
   users of this code are the most likely ones to migrate to y2038
   safe user space.

This tries to work around part of the problem by changing the
publicly visible user interface in the header, but not the binary
interface. Fortunately, the values passed around in the structure
are relative times and do not actually suffer from the y2038
overflow, so 32-bit is enough here.

We replace the use of 'struct timeval' with a newly defined
'struct bcm_timeval' that uses the exact same binary layout
as before and that still suffers from problem a) but not problem
b).

The downside of this approach is that any user space program
that currently assigns a timeval structure to these members
rather than writing the tv_sec/tv_usec portions individually
will suffer a compile-time error when built with an updated
kernel header. Fixing this error makes it work fine with old
and new headers though.

We could address problem a) by using '__u32' or 'int' members
rather than 'long', but that would have a more significant
downside in also breaking support for all existing 64-bit user
binaries that might be using this interface, which is likely
not acceptable.

Signed-off-by: Arnd Bergmann 
Cc: Oliver Hartkopp 
Cc: Marc Kleine-Budde 
Cc: linux-...@vger.kernel.org
Cc: linux-...@vger.kernel.org
---
 include/uapi/linux/can/bcm.h |  7 ++-
 net/can/bcm.c| 15 ++-
 2 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/include/uapi/linux/can/bcm.h b/include/uapi/linux/can/bcm.h
index 89ddb9dc9bdf..7a291dc1ff15 100644
--- a/include/uapi/linux/can/bcm.h
+++ b/include/uapi/linux/can/bcm.h
@@ -47,6 +47,11 @@
 #include 
 #include 
 
+struct bcm_timeval {
+   long tv_sec;
+   long tv_usec;
+};
+
 /**
  * struct bcm_msg_head - head of messages to/from the broadcast manager
  * @opcode:opcode, see enum below.
@@ -62,7 +67,7 @@ struct bcm_msg_head {
__u32 opcode;
__u32 flags;
__u32 count;
-   struct timeval ival1, ival2;
+   struct bcm_timeval ival1, ival2;
canid_t can_id;
__u32 nframes;
struct can_frame frames[0];
diff --git a/net/can/bcm.c b/net/can/bcm.c
index a1ba6875c2a2..6863310d6973 100644
--- a/net/can/bcm.c
+++ b/net/can/bcm.c
@@ -96,7 +96,7 @@ struct bcm_op {
canid_t can_id;
u32 flags;
unsigned long frames_abs, frames_filtered;
-   struct timeval ival1, ival2;
+   struct bcm_timeval ival1, ival2;
struct hrtimer timer, thrtimer;
struct tasklet_struct tsklet, thrtsklet;
ktime_t rx_stamp, kt_ival1, kt_ival2, kt_lastmsg;
@@ -131,6 +131,11 @@ static inline struct bcm_sock *bcm_sk(const struct sock 
*sk)
return (struct bcm_sock *)sk;
 }
 
+static inline ktime_t bcm_timeval_to_ktime(struct bcm_timeval tv)
+{
+   return ktime_set(tv.tv_sec, tv.tv_usec * NSEC_PER_USEC);
+}
+
 #define CFSIZ sizeof(struct can_frame)
 #define OPSIZ sizeof(struct bcm_op)
 #define MHSIZ sizeof(struct bcm_msg_head)
@@ -953,8 +958,8 @@ static int bcm_tx_setup(struct bcm_msg_head *msg_head, 
struct msghdr *msg,
op->count = msg_head->count;
op->ival1 = msg_head->ival1;
op->ival2 = msg_head->ival2;
-   op->kt_ival1 = timeval_to_ktime(msg_head->ival1);
-   op->kt_ival2 = timeval_to_ktime(msg_head->ival2);
+   op->kt_ival1 = bcm_timeval_to_ktime(msg_head->ival1);
+   op->kt_ival2 = bcm_timeval_to_ktime(msg_head->ival2);
 
/* disable an active timer due to zero values? */
if (!op->kt_ival1.tv64 && !op->kt_ival2.tv64)
@@ -1134,8 +1139,8 @@ static int bcm_rx_setup(struct bcm_msg_head *msg_head, 
struct msghdr *msg,
/* set timer value */
op->ival1 = msg_head->ival1;
op->ival2 = msg_head->ival2;
-   op->kt_ival1 = timeval_to_ktime(msg_head->ival1);
-   op->kt_ival2 = timeval_to_ktime(msg_head->ival2);
+   op->kt_ival1 = bcm_timeval_to_ktime(msg_head->ival1);
+   op->kt_ival2 = bcm_timeval_to_ktime(msg_head->ival2);
 
/* disable an active timer due to zero value? */
if (!op->kt_ival1.tv64)
-- 
2.1.0.rc2

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message t

[PATCH 10/12] net: sctp: avoid incorrect time_t use

2015-09-30 Thread Arnd Bergmann

We want to avoid using time_t in the kernel because of the y2038
overflow problem. The use in sctp is not for storing seconds at
all, but instead uses microseconds and is passed as 32-bit
on all machines.

This patch changes the type to u32, which better fits the use.

Signed-off-by: Arnd Bergmann 
Cc: Vlad Yasevich 
Cc: Neil Horman 
Cc: linux-s...@vger.kernel.org
---
 net/sctp/sm_make_chunk.c | 2 +-
 net/sctp/sm_statefuns.c  | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c
index 7954c52e1794..763e06a55155 100644
--- a/net/sctp/sm_make_chunk.c
+++ b/net/sctp/sm_make_chunk.c
@@ -2494,7 +2494,7 @@ static int sctp_process_param(struct sctp_association 
*asoc,
__u16 sat;
int retval = 1;
sctp_scope_t scope;
-   time_t stale;
+   u32 stale;
struct sctp_af *af;
union sctp_addr_param *addr_param;
struct sctp_transport *t;
diff --git a/net/sctp/sm_statefuns.c b/net/sctp/sm_statefuns.c
index d7eaa7354cf7..6f46aa16cb76 100644
--- a/net/sctp/sm_statefuns.c
+++ b/net/sctp/sm_statefuns.c
@@ -2306,7 +2306,7 @@ static sctp_disposition_t sctp_sf_do_5_2_6_stale(struct 
net *net,
 sctp_cmd_seq_t *commands)
 {
struct sctp_chunk *chunk = arg;
-   time_t stale;
+   u32 stale;
sctp_cookie_preserve_param_t bht;
sctp_errhdr_t *err;
struct sctp_chunk *reply;
-- 
2.1.0.rc2

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 05/12] mwifiex: avoid gettimeofday in ba_threshold setting

2015-09-30 Thread Arnd Bergmann

mwifiex_get_random_ba_threshold() uses a complex homegrown implementation
to generate a pseudo-random number from the current time as returned
from do_gettimeofday().

This currently requires two 32-bit divisions plus a couple of other
computations that are eventually discarded as only eight bits of
the microsecond portion are used at all.

We could replace this with a call to get_random_bytes(), but that
might drain the entropy pool too fast if this is called for each
packet.

Instead, this patch converts it to use ktime_get_ns(), which is a
bit faster than do_gettimeofday(), and then uses a similar algorithm
as before, but in a way that takes both the nanosecond and second
portion into account for slightly-more-but-still-not-very-random
pseudorandom number.

Signed-off-by: Arnd Bergmann 
Cc: Amitkumar Karwar 
Cc: Nishant Sarmukadam 
Cc: Kalle Valo 
Cc: linux-wirel...@vger.kernel.org
---
 drivers/net/wireless/mwifiex/wmm.c | 15 ---
 1 file changed, 4 insertions(+), 11 deletions(-)

diff --git a/drivers/net/wireless/mwifiex/wmm.c 
b/drivers/net/wireless/mwifiex/wmm.c
index 173d3663c2e0..878d358063dc 100644
--- a/drivers/net/wireless/mwifiex/wmm.c
+++ b/drivers/net/wireless/mwifiex/wmm.c
@@ -117,22 +117,15 @@ mwifiex_wmm_allocate_ralist_node(struct mwifiex_adapter 
*adapter, const u8 *ra)
  */
 static u8 mwifiex_get_random_ba_threshold(void)
 {
-   u32 sec, usec;
-   struct timeval ba_tstamp;
-   u8 ba_threshold;
-
+   u64 ns;
/* setup ba_packet_threshold here random number between
 * [BA_SETUP_PACKET_OFFSET,
 * BA_SETUP_PACKET_OFFSET+BA_SETUP_MAX_PACKET_THRESHOLD-1]
 */
+   ns = ktime_get_ns();
+   ns += (ns >> 32) + (ns >> 16);
 
-   do_gettimeofday(&ba_tstamp);
-   sec = (ba_tstamp.tv_sec & 0x) + (ba_tstamp.tv_sec >> 16);
-   usec = (ba_tstamp.tv_usec & 0x) + (ba_tstamp.tv_usec >> 16);
-   ba_threshold = (((sec << 16) + usec) % BA_SETUP_MAX_PACKET_THRESHOLD)
- + BA_SETUP_PACKET_OFFSET;
-
-   return ba_threshold;
+   return ((u8)ns % BA_SETUP_MAX_PACKET_THRESHOLD) + 
BA_SETUP_PACKET_OFFSET;
 }
 
 /*
-- 
2.1.0.rc2

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 08/12] nfnetlink: use y2038 safe timestamp

2015-09-30 Thread Arnd Bergmann

The __build_packet_message function fills a nfulnl_msg_packet_timestamp
structure that uses 64-bit seconds and is therefore y2038 safe, but
it uses an intermediate 'struct timespec' which is not.

This trivially changes the code to use 'struct timespec64' instead,
to correct the result on 32-bit architectures.

Signed-off-by: Arnd Bergmann 
Cc: Pablo Neira Ayuso 
Cc: Patrick McHardy 
Cc: Jozsef Kadlecsik 
Cc: netfilter-de...@vger.kernel.org
Cc: coret...@netfilter.org
---
 net/netfilter/nfnetlink_log.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/netfilter/nfnetlink_log.c b/net/netfilter/nfnetlink_log.c
index 4670821b569d..cc2300f4e177 100644
--- a/net/netfilter/nfnetlink_log.c
+++ b/net/netfilter/nfnetlink_log.c
@@ -538,9 +538,9 @@ __build_packet_message(struct nfnl_log_net *log,
 
if (skb->tstamp.tv64) {
struct nfulnl_msg_packet_timestamp ts;
-   struct timeval tv = ktime_to_timeval(skb->tstamp);
-   ts.sec = cpu_to_be64(tv.tv_sec);
-   ts.usec = cpu_to_be64(tv.tv_usec);
+   struct timespec64 kts = ktime_to_timespec64(skb->tstamp);
+   ts.sec = cpu_to_be64(kts.tv_sec);
+   ts.usec = cpu_to_be64(kts.tv_nsec / NSEC_PER_USEC);
 
if (nla_put(inst->skb, NFULA_TIMESTAMP, sizeof(ts), &ts))
goto nla_put_failure;
-- 
2.1.0.rc2

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 04/12] mwifiex: use ktime_get_real for timestamping

2015-09-30 Thread Arnd Bergmann

The mwifiex_11n_aggregate_pkt() function creates a ktime_t from
a timeval returned by do_gettimeofday, which is slow and causes
an overflow in 2038 on 32-bit architectures.

This solves both problems by using the appropriate ktime_get_real()
function.

Signed-off-by: Arnd Bergmann 
Cc: Amitkumar Karwar 
Cc: Nishant Sarmukadam 
Cc: Kalle Valo 
Cc: linux-wirel...@vger.kernel.org
---
 drivers/net/wireless/mwifiex/11n_aggr.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/net/wireless/mwifiex/11n_aggr.c 
b/drivers/net/wireless/mwifiex/11n_aggr.c
index f7c717253a66..78853c51774d 100644
--- a/drivers/net/wireless/mwifiex/11n_aggr.c
+++ b/drivers/net/wireless/mwifiex/11n_aggr.c
@@ -173,7 +173,6 @@ mwifiex_11n_aggregate_pkt(struct mwifiex_private *priv,
int pad = 0, aggr_num = 0, ret;
struct mwifiex_tx_param tx_param;
struct txpd *ptx_pd = NULL;
-   struct timeval tv;
int headroom = adapter->iface_type == MWIFIEX_USB ? 0 : INTF_HEADER_LEN;
 
skb_src = skb_peek(&pra_list->skb_head);
@@ -203,8 +202,7 @@ mwifiex_11n_aggregate_pkt(struct mwifiex_private *priv,
tx_info_aggr->flags |= MWIFIEX_BUF_FLAG_AGGR_PKT;
skb_aggr->priority = skb_src->priority;
 
-   do_gettimeofday(&tv);
-   skb_aggr->tstamp = timeval_to_ktime(tv);
+   skb_aggr->tstamp = ktime_get_real();
 
do {
/* Check if AMSDU can accommodate this MSDU */
-- 
2.1.0.rc2

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 01/12] net: fec: avoid timespec use

2015-09-30 Thread Arnd Bergmann

The fec_ptp_enable_pps uses an open-coded implementation of ns_to_timespec,
which will be removed eventually as it is not y2038-safe on 32-bit
architectures. Two more instances of the same code in this file were
already converted to use the safe ns_to_timespec64 in commit 6630514fcee
("ptp: fec: use helpers for converting ns to timespec"), this changes
the last one as well.

The seconds portion here is actually unused and we could just remove the
timespec variable, but using ns_to_timespec64 can still be better as the
implementation can be hand-optimized in the future.

Signed-off-by: Arnd Bergmann 
Cc: Richard Cochran 
Cc: Fugang Duan 
Cc: Luwei Zhou 
Cc: Frank Li 
---
 drivers/net/ethernet/freescale/fec_ptp.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/freescale/fec_ptp.c 
b/drivers/net/ethernet/freescale/fec_ptp.c
index 1543cf0e8ef6..f9e74461bdc0 100644
--- a/drivers/net/ethernet/freescale/fec_ptp.c
+++ b/drivers/net/ethernet/freescale/fec_ptp.c
@@ -112,9 +112,8 @@ static int fec_ptp_enable_pps(struct fec_enet_private *fep, 
uint enable)
unsigned long flags;
u32 val, tempval;
int inc;
-   struct timespec ts;
+   struct timespec64 ts;
u64 ns;
-   u32 remainder;
val = 0;
 
if (!(fep->hwts_tx_en || fep->hwts_rx_en)) {
@@ -163,8 +162,7 @@ static int fec_ptp_enable_pps(struct fec_enet_private *fep, 
uint enable)
tempval = readl(fep->hwp + FEC_ATIME);
/* Convert the ptp local counter to 1588 timestamp */
ns = timecounter_cyc2time(&fep->tc, tempval);
-   ts.tv_sec = div_u64_rem(ns, 10ULL, &remainder);
-   ts.tv_nsec = remainder;
+   ts = ns_to_timespec64(ns);
 
/* The tempval is  less than 3 seconds, and  so val is less than
 * 4 seconds. No overflow for 32bit calculation.
-- 
2.1.0.rc2

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 03/12] net: igb: avoid using timespec

2015-09-30 Thread Arnd Bergmann

We want to deprecate the use of 'struct timespec' on 32-bit
architectures, as it is will overflow in 2038. The igb
driver uses it to read the current time, and can simply
be changed to use ktime_get_real_ts64() instead.

Because of hardware limitations, there is still an overflow
in year 2106, which we cannot really avoid, but this documents
the overflow.

Signed-off-by: Arnd Bergmann 
Cc: Jeff Kirsher 
Cc: intel-wired-...@lists.osuosl.org
---
 drivers/net/ethernet/intel/igb/igb.h  |  4 ++--
 drivers/net/ethernet/intel/igb/igb_main.c | 15 ---
 drivers/net/ethernet/intel/igb/igb_ptp.c  |  8 
 3 files changed, 14 insertions(+), 13 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/igb.h 
b/drivers/net/ethernet/intel/igb/igb.h
index 212d668dabb3..1a2f1cc44b28 100644
--- a/drivers/net/ethernet/intel/igb/igb.h
+++ b/drivers/net/ethernet/intel/igb/igb.h
@@ -444,8 +444,8 @@ struct igb_adapter {
 
struct ptp_pin_desc sdp_config[IGB_N_SDP];
struct {
-   struct timespec start;
-   struct timespec period;
+   struct timespec64 start;
+   struct timespec64 period;
} perout[IGB_N_PEROUT];
 
char fw_version[32];
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c 
b/drivers/net/ethernet/intel/igb/igb_main.c
index e174fbbdba40..911bbadbb994 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -5389,7 +5389,7 @@ static void igb_tsync_interrupt(struct igb_adapter 
*adapter)
 {
struct e1000_hw *hw = &adapter->hw;
struct ptp_clock_event event;
-   struct timespec ts;
+   struct timespec64 ts;
u32 ack = 0, tsauxc, sec, nsec, tsicr = rd32(E1000_TSICR);
 
if (tsicr & TSINTR_SYS_WRAP) {
@@ -5409,10 +5409,11 @@ static void igb_tsync_interrupt(struct igb_adapter 
*adapter)
 
if (tsicr & TSINTR_TT0) {
spin_lock(&adapter->tmreg_lock);
-   ts = timespec_add(adapter->perout[0].start,
- adapter->perout[0].period);
+   ts = timespec64_add(adapter->perout[0].start,
+   adapter->perout[0].period);
+   /* u32 conversion of tv_sec is safe until y2106 */
wr32(E1000_TRGTTIML0, ts.tv_nsec);
-   wr32(E1000_TRGTTIMH0, ts.tv_sec);
+   wr32(E1000_TRGTTIMH0, (u32)ts.tv_sec);
tsauxc = rd32(E1000_TSAUXC);
tsauxc |= TSAUXC_EN_TT0;
wr32(E1000_TSAUXC, tsauxc);
@@ -5423,10 +5424,10 @@ static void igb_tsync_interrupt(struct igb_adapter 
*adapter)
 
if (tsicr & TSINTR_TT1) {
spin_lock(&adapter->tmreg_lock);
-   ts = timespec_add(adapter->perout[1].start,
- adapter->perout[1].period);
+   ts = timespec64_add(adapter->perout[1].start,
+   adapter->perout[1].period);
wr32(E1000_TRGTTIML1, ts.tv_nsec);
-   wr32(E1000_TRGTTIMH1, ts.tv_sec);
+   wr32(E1000_TRGTTIMH1, (u32)ts.tv_sec);
tsauxc = rd32(E1000_TSAUXC);
tsauxc |= TSAUXC_EN_TT1;
wr32(E1000_TSAUXC, tsauxc);
diff --git a/drivers/net/ethernet/intel/igb/igb_ptp.c 
b/drivers/net/ethernet/intel/igb/igb_ptp.c
index 5982f28d521a..c44df87c38de 100644
--- a/drivers/net/ethernet/intel/igb/igb_ptp.c
+++ b/drivers/net/ethernet/intel/igb/igb_ptp.c
@@ -143,7 +143,7 @@ static void igb_ptp_write_i210(struct igb_adapter *adapter,
 * sub-nanosecond resolution.
 */
wr32(E1000_SYSTIML, ts->tv_nsec);
-   wr32(E1000_SYSTIMH, ts->tv_sec);
+   wr32(E1000_SYSTIMH, (u32)ts->tv_sec);
 }
 
 /**
@@ -479,7 +479,7 @@ static int igb_ptp_feature_enable_i210(struct 
ptp_clock_info *ptp,
struct e1000_hw *hw = &igb->hw;
u32 tsauxc, tsim, tsauxc_mask, tsim_mask, trgttiml, trgttimh, freqout;
unsigned long flags;
-   struct timespec ts;
+   struct timespec64 ts;
int use_freq = 0, pin = -1;
s64 ns;
 
@@ -523,14 +523,14 @@ static int igb_ptp_feature_enable_i210(struct 
ptp_clock_info *ptp,
}
ts.tv_sec = rq->perout.period.sec;
ts.tv_nsec = rq->perout.period.nsec;
-   ns = timespec_to_ns(&ts);
+   ns = timespec64_to_ns(&ts);
ns = ns >> 1;
if (on && ns <= 7000LL) {
if (ns < 8LL)
return -EINVAL;
use_freq = 1;
}
-   ts = ns_to_timespec(ns);
+   ts = ns_to_timespec64(ns);
if (rq->perout.index == 1) {
if (use_freq) {
tsauxc_mask = TSAUXC_EN_CLK1 | TSAUXC_ST1;
-- 
2.1.0.rc2

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to maj

[PATCH 00/12] net: assorted y2038 changes

2015-09-30 Thread Arnd Bergmann

Hi everyone,

This is a set of changes for network drivers and core code to
get rid of the use of time_t and derived data structures.

I have a longer set of patches that enables me to build kernels
with the time_t definition removed completely as a help to find
y2038 overflow issues. This is the subset for networking that
contains all code that has a reasonable way of fixing at the
moment and that is either commonly used (in one of the defconfigs)
or that blocks building a whole subsystem.

Most of the patches in this series should be noncontroversial,
but the last two that I marked [RFC] are a bit tricky and
need input from people that are more familiar with the code than
I am. All 12 patches are independent of one another and can
be applied in any order, so feel free to pick all that look
good.

Patches that are not included here are:

 - disabling less common device drivers that I don't have a fix
   for yet, this includes
drivers/net/ethernet/brocade/bna/bfa_ioc.c
drivers/net/ethernet/qlogic/netxen/netxen_nic_hw.c
drivers/net/ethernet/tile/tilegx.c
drivers/net/hamradio/baycom_ser_fdx.c
drivers/net/wireless/ath/ath10k/core.h
drivers/net/wireless/ath/ath9k/
drivers/net/wireless/ath/ath9k/
drivers/net/wireless/atmel.c
drivers/net/wireless/prism54/isl_38xx.c
drivers/net/wireless/rt2x00/rt2x00debug.c
drivers/net/wireless/rtlwifi/
drivers/net/wireless/ti/wlcore/
drivers/staging/ozwpan/
net/atm/mpoa_caches.c
net/atm/mpoa_proc.c
net/dccp/probe.c
net/ipv4/tcp_probe.c
net/netfilter/nfnetlink_queue_core.c
net/netfilter/nfnetlink_queue_core.c
net/netfilter/xt_time.c
net/openvswitch/flow.c
net/sctp/probe.c
net/sunrpc/auth_gss/
net/sunrpc/svcauth_unix.c
net/vmw_vsock/af_vsock.c
   We'll get there eventually, or we an add a dependency to ensure
   they are not built on 32-bit kernels that need to survive
   beyond 2038. Most of these should be really easy to fix.

 - recvmmsg/sendmmsg system calls: patches have been sent out
   as part of the syscall series, need a little more work and
   review

 - SIOCGSTAMP/SIOCGSTAMPNS/ ioctl calls: tricky, need to discuss
   with some folks at kernel summit

 - SO_RCVTIMEO/SO_SNDTIMEO/SO_TIMESTAMP/SO_TIMESTAMPNS socket
   opt: similar and related to the ioctl

 - mmapped packet socket: need to create v4 of the API, nontrivial

 - pktgen: sends 32-bit timestamps over network, need to find out
   if using unsigned stamps is good enough

 - af_rxpc: similar to pktgen, uses 32-bit times for deadlines

 - ppp ioctl: patch is being worked on, nontrivial but doable

Arnd

Arnd Bergmann (12):
  net: fec: avoid timespec use
  net: stmmac: avoid using timespec
  net: igb: avoid using timespec
  mwifiex: use ktime_get_real for timestamping
  mwifiex: avoid gettimeofday in ba_threshold setting
  mac80211: use ktime_get_seconds
  atm: hide 'struct zatm_t_hist'
  nfnetlink: use y2038 safe timestamp
  ipv6: use ktime_t for internal timestamps
  net: sctp: avoid incorrect time_t use
  [RFC] ipv4: avoid timespec in timestamp computation
  [RFC] can: avoid using timeval for uapi

 drivers/net/ethernet/freescale/fec_ptp.c  |  6 ++--
 drivers/net/ethernet/intel/igb/igb.h  |  4 +--
 drivers/net/ethernet/intel/igb/igb_main.c | 15 +-
 drivers/net/ethernet/intel/igb/igb_ptp.c  |  8 +++---
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c |  8 --
 drivers/net/wireless/mwifiex/11n_aggr.c   |  4 +--
 drivers/net/wireless/mwifiex/wmm.c| 15 +++---
 include/linux/timekeeping.h   |  2 ++
 include/uapi/linux/atm_zatm.h |  3 +-
 include/uapi/linux/can/bcm.h  |  7 -
 kernel/time/timekeeping.c | 34 +++
 net/can/bcm.c | 15 ++
 net/ipv4/icmp.c   |  8 ++
 net/ipv4/ip_options.c |  9 ++
 net/ipv6/mip6.c   | 16 +--
 net/mac80211/sta_info.c   |  8 ++
 net/netfilter/nfnetlink_log.c |  6 ++--
 net/sctp/sm_make_chunk.c  |  2 +-
 net/sctp/sm_statefuns.c   |  2 +-
 19 files changed, 99 insertions(+), 73 deletions(-)

Cc: coret...@netfilter.org
Cc: intel-wired-...@lists.osuosl.org
Cc: linux-...@vger.kernel.org
Cc: linux-atm-gene...@lists.sourceforge.net
Cc: linux-...@vger.kernel.org
Cc: linux-s...@vger.kernel.org
Cc: linux-wirel...@vger.kernel.org
Cc: netfilter-de...@vger.kernel.org


-- 
2.1.0.rc2

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 06/12] mac80211: use ktime_get_seconds

2015-09-30 Thread Arnd Bergmann

The mac80211 code uses ktime_get_ts to measure the connected time.
As this uses monotonic time, it is y2038 safe on 32-bit systems,
but we still want to deprecate the use of 'timespec' because most
other users are broken.

This changes the code to use ktime_get_seconds() instead, which
avoids the timespec structure and is slightly more efficient.

Signed-off-by: Arnd Bergmann 
Cc: Johannes Berg 
Cc: linux-wirel...@vger.kernel.org
---
 net/mac80211/sta_info.c | 8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/net/mac80211/sta_info.c b/net/mac80211/sta_info.c
index 64f1936350c6..c3644458e2ee 100644
--- a/net/mac80211/sta_info.c
+++ b/net/mac80211/sta_info.c
@@ -303,7 +303,6 @@ struct sta_info *sta_info_alloc(struct 
ieee80211_sub_if_data *sdata,
struct ieee80211_local *local = sdata->local;
struct ieee80211_hw *hw = &local->hw;
struct sta_info *sta;
-   struct timespec uptime;
int i;
 
sta = kzalloc(sizeof(*sta) + hw->sta_data_size, gfp);
@@ -339,8 +338,7 @@ struct sta_info *sta_info_alloc(struct 
ieee80211_sub_if_data *sdata,
/* Mark TID as unreserved */
sta->reserved_tid = IEEE80211_TID_UNRESERVED;
 
-   ktime_get_ts(&uptime);
-   sta->last_connected = uptime.tv_sec;
+   sta->last_connected = ktime_get_seconds();
ewma_signal_init(&sta->avg_signal);
for (i = 0; i < ARRAY_SIZE(sta->chain_signal_avg); i++)
ewma_signal_init(&sta->chain_signal_avg[i]);
@@ -1813,7 +1811,6 @@ void sta_set_sinfo(struct sta_info *sta, struct 
station_info *sinfo)
struct ieee80211_sub_if_data *sdata = sta->sdata;
struct ieee80211_local *local = sdata->local;
struct rate_control_ref *ref = NULL;
-   struct timespec uptime;
u32 thr = 0;
int i, ac;
 
@@ -1838,8 +1835,7 @@ void sta_set_sinfo(struct sta_info *sta, struct 
station_info *sinfo)
 BIT(NL80211_STA_INFO_RX_DROP_MISC) |
 BIT(NL80211_STA_INFO_BEACON_LOSS);
 
-   ktime_get_ts(&uptime);
-   sinfo->connected_time = uptime.tv_sec - sta->last_connected;
+   sinfo->connected_time = ktime_get_seconds() - sta->last_connected;
sinfo->inactive_time = jiffies_to_msecs(jiffies - sta->last_rx);
 
if (!(sinfo->filled & (BIT(NL80211_STA_INFO_TX_BYTES64) |
-- 
2.1.0.rc2

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 02/12] net: stmmac: avoid using timespec

2015-09-30 Thread Arnd Bergmann

We want to deprecate the use of 'struct timespec' on 32-bit
architectures, as it is will overflow in 2038. The stmmac
driver uses it to read the current time, and can simply
be changed to use ktime_get_real_ts64() instead.

Because of hardware limitations, there is still an overflow
in year 2106, which we cannot really avoid, but this documents
the overflow.

Signed-off-by: Arnd Bergmann 
Cc: Giuseppe Cavallaro 
Cc: Richard Cochran 
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 925f2f8659b8..83a1db1b53f3 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -424,7 +424,7 @@ static int stmmac_hwtstamp_ioctl(struct net_device *dev, 
struct ifreq *ifr)
 {
struct stmmac_priv *priv = netdev_priv(dev);
struct hwtstamp_config config;
-   struct timespec now;
+   struct timespec64 now;
u64 temp = 0;
u32 ptp_v2 = 0;
u32 tstamp_all = 0;
@@ -621,8 +621,10 @@ static int stmmac_hwtstamp_ioctl(struct net_device *dev, 
struct ifreq *ifr)
 priv->default_addend);
 
/* initialize system time */
-   getnstimeofday(&now);
-   priv->hw->ptp->init_systime(priv->ioaddr, now.tv_sec,
+   ktime_get_real_ts64(&now);
+
+   /* lower 32 bits of tv_sec are safe until y2106 */
+   priv->hw->ptp->init_systime(priv->ioaddr, (u32)now.tv_sec,
now.tv_nsec);
}
 
-- 
2.1.0.rc2

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[MM PATCH V4.1 5/6] slub: support for bulk free with SLUB freelists

2015-09-30 Thread Jesper Dangaard Brouer

Make it possible to free a freelist with several objects by adjusting
API of slab_free() and __slab_free() to have head, tail and an objects
counter (cnt).

Tail being NULL indicate single object free of head object.  This
allow compiler inline constant propagation in slab_free() and
slab_free_freelist_hook() to avoid adding any overhead in case of
single object free.

This allows a freelist with several objects (all within the same
slab-page) to be free'ed using a single locked cmpxchg_double in
__slab_free() and with an unlocked cmpxchg_double in slab_free().

Object debugging on the free path is also extended to handle these
freelists.  When CONFIG_SLUB_DEBUG is enabled it will also detect if
objects don't belong to the same slab-page.

These changes are needed for the next patch to bulk free the detached
freelists it introduces and constructs.

Micro benchmarking showed no performance reduction due to this change,
when debugging is turned off (compiled with CONFIG_SLUB_DEBUG).

Signed-off-by: Jesper Dangaard Brouer 
Signed-off-by: Alexander Duyck 

---
V4:
 - Change API per req of Christoph Lameter
 - Remove comments in init_object.

V4.1:
 - Took Alex'es approach on defines inside slab_free_freelist_hook()

 mm/slub.c |   85 -
 1 file changed, 67 insertions(+), 18 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 1cf98d89546d..99fcfa8ed0c7 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1063,11 +1063,15 @@ bad:
return 0;
 }
 
+/* Supports checking bulk free of a constructed freelist */
 static noinline struct kmem_cache_node *free_debug_processing(
-   struct kmem_cache *s, struct page *page, void *object,
+   struct kmem_cache *s, struct page *page,
+   void *head, void *tail, int bulk_cnt,
unsigned long addr, unsigned long *flags)
 {
struct kmem_cache_node *n = get_node(s, page_to_nid(page));
+   void *object = head;
+   int cnt = 0;
 
spin_lock_irqsave(&n->list_lock, *flags);
slab_lock(page);
@@ -1075,6 +1079,9 @@ static noinline struct kmem_cache_node 
*free_debug_processing(
if (!check_slab(s, page))
goto fail;
 
+next_object:
+   cnt++;
+
if (!check_valid_pointer(s, page, object)) {
slab_err(s, page, "Invalid object pointer 0x%p", object);
goto fail;
@@ -1105,8 +1112,19 @@ static noinline struct kmem_cache_node 
*free_debug_processing(
if (s->flags & SLAB_STORE_USER)
set_track(s, object, TRACK_FREE, addr);
trace(s, page, object, 0);
+   /* Freepointer not overwritten by init_object(), SLAB_POISON moved it */
init_object(s, object, SLUB_RED_INACTIVE);
+
+   /* Reached end of constructed freelist yet? */
+   if (object != tail) {
+   object = get_freepointer(s, object);
+   goto next_object;
+   }
 out:
+   if (cnt != bulk_cnt)
+   slab_err(s, page, "Bulk freelist count(%d) invalid(%d)\n",
+bulk_cnt, cnt);
+
slab_unlock(page);
/*
 * Keep node_lock to preserve integrity
@@ -1210,7 +1228,8 @@ static inline int alloc_debug_processing(struct 
kmem_cache *s,
struct page *page, void *object, unsigned long addr) { return 0; }
 
 static inline struct kmem_cache_node *free_debug_processing(
-   struct kmem_cache *s, struct page *page, void *object,
+   struct kmem_cache *s, struct page *page,
+   void *head, void *tail, int bulk_cnt,
unsigned long addr, unsigned long *flags) { return NULL; }
 
 static inline int slab_pad_check(struct kmem_cache *s, struct page *page)
@@ -1306,6 +1325,29 @@ static inline void slab_free_hook(struct kmem_cache *s, 
void *x)
kasan_slab_free(s, x);
 }
 
+static inline void slab_free_freelist_hook(struct kmem_cache *s,
+  void *head, void *tail)
+{
+/*
+ * Compiler cannot detect this function can be removed if slab_free_hook()
+ * evaluates to nothing.  Thus, catch all relevant config debug options here.
+ */
+#if defined(CONFIG_KMEMCHECK) ||   \
+   defined(CONFIG_LOCKDEP) ||  \
+   defined(CONFIG_DEBUG_KMEMLEAK) ||   \
+   defined(CONFIG_DEBUG_OBJECTS_FREE) ||   \
+   defined(CONFIG_KASAN)
+
+   void *object = head;
+   void *tail_obj = tail ? : head;
+
+   do {
+   slab_free_hook(s, object);
+   } while ((object != tail_obj) &&
+(object = get_freepointer(s, object)));
+#endif
+}
+
 static void setup_object(struct kmem_cache *s, struct page *page,
void *object)
 {
@@ -2586,10 +2628,11 @@ EXPORT_SYMBOL(kmem_cache_alloc_node_trace);
  * handling required then we can return immediately.
  */
 static void __slab_free(struct kmem_cache *s, struct page *page,
-   void *x, unsigned long addr)
+   void *head, void *tail, int cn

Re: List corruption on epoll_ctl(EPOLL_CTL_DEL) an AF_UNIX socket

2015-09-30 Thread Mathias Krause

On 30 September 2015 at 12:56, Rainer Weikusat
 wrote:
> Mathias Krause  writes:
>> On 29 September 2015 at 21:09, Jason Baron  wrote:
>>> However, if we call connect on socket 's', to connect to a new socket 'o2', 
>>> we
>>> drop the reference on the original socket 'o'. Thus, we can now close socket
>>> 'o' without unregistering from epoll. Then, when we either close the ep
>>> or unregister 'o', we end up with this list corruption. Thus, this is not a
>>> race per se, but can be triggered sequentially.
>>
>> Sounds profound, but the reproducers calls connect only once per
>> socket. So there is no "connect to a new socket", no?
>> But w/e, see below.
>
> In case you want some information on this: This is a kernel warning I
> could trigger (more than once) on the single day I could so far spend
> looking into this (3.2.54 kernel):
>
> Sep 15 19:37:19 doppelsaurus kernel: WARNING: at lib/list_debug.c:53 
> list_del+0x9/0x30()
> Sep 15 19:37:19 doppelsaurus kernel: Hardware name: 500-330nam
> Sep 15 19:37:19 doppelsaurus kernel: list_del corruption. prev->next should 
> be 88022c38f078, but was dead00100100
> [snip]

Is that with Jason's patch or a vanilla v3.2.54?

Regards,
Mathias
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 11/12] [RFC] ipv4: avoid timespec in timestamp computation

2015-09-30 Thread kbuild test robot

Hi Arnd,

[auto build test results on v4.3-rc3 -- if it's inappropriate base, please 
ignore]

config: mn10300-asb2364_defconfig (attached as .config)
reproduce:
  wget 
https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
 -O ~/bin/make.cross
  chmod +x ~/bin/make.cross
  git checkout e1358094d427405462a47d6d2650458b689e55d9
  # save the attached .config to linux build tree
  make.cross ARCH=mn10300 

All error/warnings (new ones prefixed by >>):

   am33_2.0-linux-ld: Dwarf Error: mangled line number section.
   am33_2.0-linux-ld: Dwarf Error: mangled line number section.
   kernel/built-in.o: In function `current_thread_info':
   (___ksymtab_gpl+ktime_get_ms_since_midnight+0x0): undefined reference to 
`ktime_get_ms_since_midnight'
   am33_2.0-linux-ld: Dwarf Error: mangled line number section.
   am33_2.0-linux-ld: Dwarf Error: mangled line number section.
   am33_2.0-linux-ld: Dwarf Error: mangled line number section.
   am33_2.0-linux-ld: Dwarf Error: mangled line number section.
   am33_2.0-linux-ld: Dwarf Error: mangled line number section.
   am33_2.0-linux-ld: Dwarf Error: mangled line number section.
   am33_2.0-linux-ld: Dwarf Error: mangled line number section.
   am33_2.0-linux-ld: Dwarf Error: mangled line number section.
   am33_2.0-linux-ld: Dwarf Error: mangled line number section.
   net/built-in.o: In function `register_netevent_notifier':
>> net/core/netevent.c:29: undefined reference to `ktime_get_ms_since_midnight'
>> net/core/netevent.c:29: undefined reference to `ktime_get_ms_since_midnight'
   net/core/netevent.c:17: undefined reference to `ktime_get_ms_since_midnight'

vim +29 net/core/netevent.c

792d1932 Tom Tucker 2006-07-30  23  /**
792d1932 Tom Tucker 2006-07-30  24   *  register_netevent_notifier - register a 
netevent notifier block
792d1932 Tom Tucker 2006-07-30  25   *  @nb: notifier
792d1932 Tom Tucker 2006-07-30  26   *
792d1932 Tom Tucker 2006-07-30  27   *  Register a notifier to be called when a 
netevent occurs.
792d1932 Tom Tucker 2006-07-30  28   *  The notifier passed is linked into the 
kernel structures and must
792d1932 Tom Tucker 2006-07-30 @29   *  not be reused until it has been 
unregistered. A negative errno code
792d1932 Tom Tucker 2006-07-30  30   *  is returned on a failure.
792d1932 Tom Tucker 2006-07-30  31   */
792d1932 Tom Tucker 2006-07-30  32  int register_netevent_notifier(struct 
notifier_block *nb)

:: The code at line 29 was first introduced by commit
:: 792d1932e319ff8ba01361e7d151b1794c55c31f [NET]: Network Event Notifier 
Mechanism.

:: TO: Tom Tucker 
:: CC: David S. Miller 

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data

[RFC PATCH] ipv4: ktime_get_ms_of_day() can be static

2015-09-30 Thread kbuild test robot


Signed-off-by: Fengguang Wu 
---
 timekeeping.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 5611a6d..46b847a 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -847,7 +847,7 @@ time64_t ktime_get_real_seconds(void)
 EXPORT_SYMBOL_GPL(ktime_get_real_seconds);
 
 #if IS_ENABLED(CONFIG_INET)
-u32 ktime_get_ms_of_day(void)
+static u32 ktime_get_ms_of_day(void)
 {
struct timekeeper *tk = &tk_core.timekeeper;
struct timespec64 now;
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 11/12] [RFC] ipv4: avoid timespec in timestamp computation

2015-09-30 Thread kbuild test robot

Hi Arnd,

[auto build test results on v4.3-rc3 -- if it's inappropriate base, please 
ignore]

reproduce:
  # apt-get install sparse
  make ARCH=x86_64 allmodconfig
  make C=1 CF=-D__CHECK_ENDIAN__


sparse warnings: (new ones prefixed by >>)

>> kernel/time/timekeeping.c:850:5: sparse: symbol 'ktime_get_ms_of_day' was 
>> not declared. Should it be static?

Please review and possibly fold the followup patch.

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Y2038] [PATCH 11/12] [RFC] ipv4: avoid timespec in timestamp computation

2015-09-30 Thread Arnd Bergmann

On Wednesday 30 September 2015 19:55:43 kbuild test robot wrote:
> Hi Arnd,
> 
> [auto build test results on v4.3-rc3 -- if it's inappropriate base, please 
> ignore]
> 
> config: mn10300-asb2364_defconfig (attached as .config)
> reproduce:
>   wget 
> https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
>  -O ~/bin/make.cross
>   chmod +x ~/bin/make.cross
>   git checkout e1358094d427405462a47d6d2650458b689e55d9
>   # save the attached .config to linux build tree
>   make.cross ARCH=mn10300 
> 
> All error/warnings (new ones prefixed by >>):
> 
>am33_2.0-linux-ld: Dwarf Error: mangled line number section.
>am33_2.0-linux-ld: Dwarf Error: mangled line number section.
>kernel/built-in.o: In function `current_thread_info':
>(___ksymtab_gpl+ktime_get_ms_since_midnight+0x0): undefined reference to 
> `ktime_get_ms_since_midnight'
>am33_2.0-linux-ld: Dwarf Error: mangled line number section.
>am33_2.0-linux-ld: Dwarf Error: mangled line number section.
>am33_2.0-linux-ld: Dwarf Error: mangled line number section.
>am33_2.0-linux-ld: Dwarf Error: mangled line number section.
>am33_2.0-linux-ld: Dwarf Error: mangled line number section.
>am33_2.0-linux-ld: Dwarf Error: mangled line number section.
>am33_2.0-linux-ld: Dwarf Error: mangled line number section.
>am33_2.0-linux-ld: Dwarf Error: mangled line number section.
>am33_2.0-linux-ld: Dwarf Error: mangled line number section.
>net/built-in.o: In function `register_netevent_notifier':
> >> net/core/netevent.c:29: undefined reference to 
> >> `ktime_get_ms_since_midnight'
> >> net/core/netevent.c:29: undefined reference to 
> >> `ktime_get_ms_since_midnight'
>net/core/netevent.c:17: undefined reference to 
> `ktime_get_ms_since_midnight'

Thanks for the helpful report, I love the new feature of grabbing the patches
from the list.

Moreover, sorry for screwing up here, I changed the function name before
sending it out and only compiled the files individually but did not notice
that I only changed it in three out of four places because I did not
try to relink the kernel.

I'll send an updated version.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[net-next 10/17] i40e: Remove useless message

2015-09-30 Thread Jeff Kirsher

From: Greg Rose 

Remove a useless message that blathers on whenever a vxlan port is deleted.

Change-ID: If63fb8cf38e56cf433b68e498f11389de51919ba
Signed-off-by: Greg Rose 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 940744a..032df6d 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -8175,9 +8175,6 @@ static void i40e_del_vxlan_port(struct net_device *netdev,
pf->vxlan_ports[idx] = 0;
pf->pending_vxlan_bitmap |= BIT_ULL(idx);
pf->flags |= I40E_FLAG_VXLAN_FILTER_SYNC;
-
-   dev_info(&pf->pdev->dev, "deleting vxlan port %d\n",
-ntohs(port));
} else {
netdev_warn(netdev, "vxlan port %d was not found, not 
deleting\n",
ntohs(port));
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[net-next 07/17] i40e: count drops in netstat interface

2015-09-30 Thread Jeff Kirsher

From: Jesse Brandeburg 

The i40e rx_dropped counter was not showing up in netstat -i.
Add the right counter to be updated with the stats.

Change-ID: I4dd552e9995836099184f9d9a08e90edb591155f
Signed-off-by: Jesse Brandeburg 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
b/drivers/net/ethernet/intel/i40e/i40e_main.c
index dce7d85..3a3d49c 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -474,6 +474,7 @@ static struct rtnl_link_stats64 
*i40e_get_netdev_stats_struct(
stats->tx_errors= vsi_stats->tx_errors;
stats->tx_dropped   = vsi_stats->tx_dropped;
stats->rx_errors= vsi_stats->rx_errors;
+   stats->rx_dropped   = vsi_stats->rx_dropped;
stats->rx_crc_errors= vsi_stats->rx_crc_errors;
stats->rx_length_errors = vsi_stats->rx_length_errors;
 
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[net-next 09/17] i40e: limit debugfs io ops

2015-09-30 Thread Jeff Kirsher

From: Shannon Nelson 

Don't let the debugfs register read and write commands try to access
outside of the ioremapped space.  While we're at it, remove the use of
a misleading constant.

Change-ID: Ifce2893e232c65c7a76c23532c658f298218a81b
Signed-off-by: Shannon Nelson 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e.h |  3 ++-
 drivers/net/ethernet/intel/i40e/i40e_debugfs.c | 12 ++--
 drivers/net/ethernet/intel/i40e/i40e_main.c|  9 -
 3 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e.h 
b/drivers/net/ethernet/intel/i40e/i40e.h
index c64d18d..0044cb0 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -71,7 +71,6 @@
 #define I40E_MAX_VEB  16
 
 #define I40E_MAX_NUM_DESCRIPTORS  4096
-#define I40E_MAX_REGISTER 0x80
 #define I40E_MAX_CSR_SPACE (4 * 1024 * 1024 - 64 * 1024)
 #define I40E_DEFAULT_NUM_DESCRIPTORS  512
 #define I40E_REQ_DESCRIPTOR_MULTIPLE  32
@@ -408,6 +407,8 @@ struct i40e_pf {
/* These are only valid in NPAR modes */
u32 npar_max_bw;
u32 npar_min_bw;
+
+   u32 ioremap_len;
 };
 
 struct i40e_mac_filter {
diff --git a/drivers/net/ethernet/intel/i40e/i40e_debugfs.c 
b/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
index 508efb0..ee96106 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
@@ -1495,9 +1495,9 @@ static ssize_t i40e_dbg_command_write(struct file *filp,
}
 
/* check the range on address */
-   if (address >= I40E_MAX_REGISTER) {
-   dev_info(&pf->pdev->dev, "read reg address 0x%08x too 
large\n",
-address);
+   if (address > (pf->ioremap_len - sizeof(u32))) {
+   dev_info(&pf->pdev->dev, "read reg address 0x%08x too 
large, max=0x%08lx\n",
+address, (pf->ioremap_len - sizeof(u32)));
goto command_write_done;
}
 
@@ -1514,9 +1514,9 @@ static ssize_t i40e_dbg_command_write(struct file *filp,
}
 
/* check the range on address */
-   if (address >= I40E_MAX_REGISTER) {
-   dev_info(&pf->pdev->dev, "write reg address 0x%08x too 
large\n",
-address);
+   if (address > (pf->ioremap_len - sizeof(u32))) {
+   dev_info(&pf->pdev->dev, "write reg address 0x%08x too 
large, max=0x%08lx\n",
+address, (pf->ioremap_len - sizeof(u32)));
goto command_write_done;
}
wr32(&pf->hw, address, value);
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 3a3d49c..940744a 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -9939,7 +9939,6 @@ static void i40e_print_features(struct i40e_pf *pf)
 static int i40e_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 {
struct i40e_aq_get_phy_abilities_resp abilities;
-   unsigned long ioremap_len;
struct i40e_pf *pf;
struct i40e_hw *hw;
static u16 pfs_found;
@@ -9992,15 +9991,15 @@ static int i40e_probe(struct pci_dev *pdev, const 
struct pci_device_id *ent)
hw = &pf->hw;
hw->back = pf;
 
-   ioremap_len = min_t(unsigned long, pci_resource_len(pdev, 0),
-   I40E_MAX_CSR_SPACE);
+   pf->ioremap_len = min_t(int, pci_resource_len(pdev, 0),
+   I40E_MAX_CSR_SPACE);
 
-   hw->hw_addr = ioremap(pci_resource_start(pdev, 0), ioremap_len);
+   hw->hw_addr = ioremap(pci_resource_start(pdev, 0), pf->ioremap_len);
if (!hw->hw_addr) {
err = -EIO;
dev_info(&pdev->dev, "ioremap(0x%04x, 0x%04x) failed: 0x%x\n",
 (unsigned int)pci_resource_start(pdev, 0),
-(unsigned int)pci_resource_len(pdev, 0), err);
+pf->ioremap_len, err);
goto err_ioremap;
}
hw->vendor_id = pdev->vendor;
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[net-next 05/17] i40e: fixup padding issue in get_cee_dcb_cfg_v1_resp

2015-09-30 Thread Jeff Kirsher

From: Shannon Nelson 

The struct i40e_aqc_get_cee_dcb_cfg_v1_resp was originally defined with
word boundary layout issues, which most compilers deal with by silently
adding padding, making the actual struct larger than designed.
This patch adds an extra byte in fields reserved3 and reserved4 to directly
acknowledge that padding.

Because the struct doesn't actually change in size or layout, this doesn't
constitute a change in the API.

Change-ID: I53fa4741b73fa255621232a85fba000b0e223015
Signed-off-by: Shannon Nelson 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h | 13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h 
b/drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h
index 95d23bf..b840fab 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h
@@ -2074,6 +2074,15 @@ I40E_CHECK_CMD_LENGTH(i40e_aqc_lldp_start);
 #define I40E_AQC_CEE_ISCSI_STATUS_MASK (0x7 << I40E_AQC_CEE_ISCSI_STATUS_SHIFT)
 #define I40E_AQC_CEE_FIP_STATUS_SHIFT  0x10
 #define I40E_AQC_CEE_FIP_STATUS_MASK   (0x7 << I40E_AQC_CEE_FIP_STATUS_SHIFT)
+
+/* struct i40e_aqc_get_cee_dcb_cfg_v1_resp was originally defined with
+ * word boundary layout issues, which the Linux compilers silently deal
+ * with by adding padding, making the actual struct larger than designed.
+ * However, the FW compiler for the NIC is less lenient and complains
+ * about the struct.  Hence, the struct defined here has an extra byte in
+ * fields reserved3 and reserved4 to directly acknowledge that padding,
+ * and the new length is used in the length check macro.
+ */
 struct i40e_aqc_get_cee_dcb_cfg_v1_resp {
u8  reserved1;
u8  oper_num_tc;
@@ -2081,9 +2090,9 @@ struct i40e_aqc_get_cee_dcb_cfg_v1_resp {
u8  reserved2;
u8  oper_tc_bw[8];
u8  oper_pfc_en;
-   u8  reserved3;
+   u8  reserved3[2];
__le16  oper_app_prio;
-   u8  reserved4;
+   u8  reserved4[2];
__le16  tlv_status;
 };
 
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[net-next 01/17] i40evf: missing rtnl_unlock in i40evf_resume()

2015-09-30 Thread Jeff Kirsher

From: Vasily Averin 

Signed-off-by: Vasily Averin 
Tested-by: Andrews Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40evf/i40evf_main.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/intel/i40evf/i40evf_main.c 
b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
index 5fc8204..49fa71c 100644
--- a/drivers/net/ethernet/intel/i40evf/i40evf_main.c
+++ b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
@@ -2510,6 +2510,7 @@ static int i40evf_resume(struct pci_dev *pdev)
rtnl_lock();
err = i40evf_set_interrupt_capability(adapter);
if (err) {
+   rtnl_unlock();
dev_err(&pdev->dev, "Cannot enable MSI-X interrupts.\n");
return err;
}
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[net-next 08/17] i40e: use QOS field consistently

2015-09-30 Thread Jeff Kirsher

From: Mitch Williams 

In i40e_ndo_set_vf_port_vlan, we were using the QOS value
inconsistently, sometimes shifting it, sometimes not. Do the shift-and-
or operation correctly, once, and use the result consistently everywhere
in the function.

Change-ID: I46f062f3edc90a8a017ecec9137f4d1ab0ab9e41
Signed-off-by: Mitch Williams 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c 
b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
index eacce93..b148694 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
@@ -2089,6 +2089,7 @@ error_param:
 int i40e_ndo_set_vf_port_vlan(struct net_device *netdev,
  int vf_id, u16 vlan_id, u8 qos)
 {
+   u16 vlanprio = vlan_id | (qos << I40E_VLAN_PRIORITY_SHIFT);
struct i40e_netdev_priv *np = netdev_priv(netdev);
struct i40e_pf *pf = np->vsi->back;
struct i40e_vsi *vsi;
@@ -2116,8 +2117,7 @@ int i40e_ndo_set_vf_port_vlan(struct net_device *netdev,
goto error_pvid;
}
 
-   if (le16_to_cpu(vsi->info.pvid) ==
-   (vlan_id | (qos << I40E_VLAN_PRIORITY_SHIFT)))
+   if (le16_to_cpu(vsi->info.pvid) == vlanprio)
/* duplicate request, so just return success */
goto error_pvid;
 
@@ -2141,7 +2141,7 @@ int i40e_ndo_set_vf_port_vlan(struct net_device *netdev,
 * MAC addresses deleted.
 */
if ((!(vlan_id || qos) ||
-   (vlan_id | qos) != le16_to_cpu(vsi->info.pvid)) &&
+   vlanprio != le16_to_cpu(vsi->info.pvid)) &&
vsi->info.pvid)
ret = i40e_vsi_add_vlan(vsi, I40E_VLAN_ANY);
 
@@ -2156,8 +2156,7 @@ int i40e_ndo_set_vf_port_vlan(struct net_device *netdev,
}
}
if (vlan_id || qos)
-   ret = i40e_vsi_add_pvid(vsi,
-   vlan_id | (qos << I40E_VLAN_PRIORITY_SHIFT));
+   ret = i40e_vsi_add_pvid(vsi, vlanprio);
else
i40e_vsi_remove_pvid(vsi);
 
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[net-next 12/17] i40e: Strip VEB stats if they are disabled in HW

2015-09-30 Thread Jeff Kirsher

From: Anjali Singhai Jain 

Due to performance reasons, VEB stats have been disabled in the hw. This
patch adds code to check for that condition before accumulating these
stats.

Change-ID: I7d805669476fedabb073790403703798ae5d878e
Signed-off-by: Anjali Singhai Jain 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e.h |  1 +
 drivers/net/ethernet/intel/i40e/i40e_ethtool.c |  9 ++---
 drivers/net/ethernet/intel/i40e/i40e_main.c| 13 +
 3 files changed, 16 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e.h 
b/drivers/net/ethernet/intel/i40e/i40e.h
index 0044cb0..f6d97ad 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -325,6 +325,7 @@ struct i40e_pf {
 #define I40E_FLAG_OUTER_UDP_CSUM_CAPABLE   BIT_ULL(33)
 #define I40E_FLAG_128_QP_RSS_CAPABLE   BIT_ULL(34)
 #define I40E_FLAG_WB_ON_ITR_CAPABLEBIT_ULL(35)
+#define I40E_FLAG_VEB_STATS_ENABLEDBIT_ULL(37)
 #define I40E_FLAG_MULTIPLE_TCP_UDP_RSS_PCTYPE  BIT_ULL(38)
 #define I40E_FLAG_VEB_MODE_ENABLED BIT_ULL(40)
 
diff --git a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c 
b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
index dd2b620..1345de2 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
@@ -1264,7 +1264,8 @@ static int i40e_get_sset_count(struct net_device *netdev, 
int sset)
if (vsi == pf->vsi[pf->lan_vsi] && pf->hw.partition_id == 1) {
int len = I40E_PF_STATS_LEN(netdev);
 
-   if (pf->lan_veb != I40E_NO_VEB)
+   if ((pf->lan_veb != I40E_NO_VEB) &&
+   (pf->flags & I40E_FLAG_VEB_STATS_ENABLED))
len += I40E_VEB_STATS_TOTAL;
return len;
} else {
@@ -1337,7 +1338,8 @@ static void i40e_get_ethtool_stats(struct net_device 
*netdev,
if (vsi != pf->vsi[pf->lan_vsi] || pf->hw.partition_id != 1)
return;
 
-   if (pf->lan_veb != I40E_NO_VEB) {
+   if ((pf->lan_veb != I40E_NO_VEB) &&
+   (pf->flags & I40E_FLAG_VEB_STATS_ENABLED)) {
struct i40e_veb *veb = pf->veb[pf->lan_veb];
for (j = 0; j < I40E_VEB_STATS_LEN; j++) {
p = (char *)veb;
@@ -1410,7 +1412,8 @@ static void i40e_get_strings(struct net_device *netdev, 
u32 stringset,
if (vsi != pf->vsi[pf->lan_vsi] || pf->hw.partition_id != 1)
return;
 
-   if (pf->lan_veb != I40E_NO_VEB) {
+   if ((pf->lan_veb != I40E_NO_VEB) &&
+   (pf->flags & I40E_FLAG_VEB_STATS_ENABLED)) {
for (i = 0; i < I40E_VEB_STATS_LEN; i++) {
snprintf(p, ETH_GSTRING_LEN, "veb.%s",
i40e_gstrings_veb_stats[i].stat_string);
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 8952ab7..43e21bd 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -5907,10 +5907,12 @@ static void i40e_watchdog_subtask(struct i40e_pf *pf)
if (pf->vsi[i] && pf->vsi[i]->netdev)
i40e_update_stats(pf->vsi[i]);
 
-   /* Update the stats for the active switching components */
-   for (i = 0; i < I40E_MAX_VEB; i++)
-   if (pf->veb[i])
-   i40e_update_veb_stats(pf->veb[i]);
+   if (pf->flags & I40E_FLAG_VEB_STATS_ENABLED) {
+   /* Update the stats for the active switching components */
+   for (i = 0; i < I40E_MAX_VEB; i++)
+   if (pf->veb[i])
+   i40e_update_veb_stats(pf->veb[i]);
+   }
 
i40e_ptp_rx_hang(pf->vsi[pf->lan_vsi]);
 }
@@ -7998,6 +8000,9 @@ static int i40e_sw_init(struct i40e_pf *pf)
pf->lan_veb = I40E_NO_VEB;
pf->lan_vsi = I40E_NO_VSI;
 
+   /* By default FW has this off for performance reasons */
+   pf->flags &= ~I40E_FLAG_VEB_STATS_ENABLED;
+
/* set up queue assignment tracking */
size = sizeof(struct i40e_lump_tracking)
+ (sizeof(u16) * pf->hw.func_caps.num_tx_qp);
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[net-next 16/17] i40e: fix kbuild warnings

2015-09-30 Thread Jeff Kirsher

From: Jesse Brandeburg 

The 0day build infrastructure found some issues in i40e, this
removes the warnings by adding a harmless cast to a dev_info.

CC: kbuild-...@01.org
Signed-off-by: Jesse Brandeburg 
Reported-by: kbuild test robot 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_debugfs.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_debugfs.c 
b/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
index ee96106..9f9d842 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
@@ -1497,7 +1497,7 @@ static ssize_t i40e_dbg_command_write(struct file *filp,
/* check the range on address */
if (address > (pf->ioremap_len - sizeof(u32))) {
dev_info(&pf->pdev->dev, "read reg address 0x%08x too 
large, max=0x%08lx\n",
-address, (pf->ioremap_len - sizeof(u32)));
+address, (unsigned long int)(pf->ioremap_len - 
sizeof(u32)));
goto command_write_done;
}
 
@@ -1516,7 +1516,7 @@ static ssize_t i40e_dbg_command_write(struct file *filp,
/* check the range on address */
if (address > (pf->ioremap_len - sizeof(u32))) {
dev_info(&pf->pdev->dev, "write reg address 0x%08x too 
large, max=0x%08lx\n",
-address, (pf->ioremap_len - sizeof(u32)));
+address, (unsigned long int)(pf->ioremap_len - 
sizeof(u32)));
goto command_write_done;
}
wr32(&pf->hw, address, value);
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[net-next 06/17] i40e/i40evf: fix Tx hang workaround code

2015-09-30 Thread Jeff Kirsher

From: Jesse Brandeburg 

The arm writeback (arm_wb) code is used for kicking the Tx ring to
make sure any pending work is completed even if interrupts are
disabled. It was running when it didn't need to, and not clearing
the ring->arm_wb state after it was set.  This caused Tx hangs
to still occur occasionally when there really was no hang.
Fix this by resetting the variable right after it was used.

Change-ID: I7bf75d552ba9c4bd203d40615213861a24bb5594
Signed-off-by: Jesse Brandeburg 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_txrx.c   | 1 +
 drivers/net/ethernet/intel/i40evf/i40e_txrx.c | 3 +--
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c 
b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index 3ce4900..47dba9b0 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -1842,6 +1842,7 @@ int i40e_napi_poll(struct napi_struct *napi, int budget)
i40e_for_each_ring(ring, q_vector->tx) {
clean_complete &= i40e_clean_tx_irq(ring, vsi->work_limit);
arm_wb |= ring->arm_wb;
+   ring->arm_wb = false;
}
 
/* We attempt to distribute budget to each Rx queue fairly, but don't
diff --git a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c 
b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
index 8309793..aaee89f 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
@@ -254,8 +254,6 @@ static bool i40e_clean_tx_irq(struct i40e_ring *tx_ring, 
int budget)
!test_bit(__I40E_DOWN, &tx_ring->vsi->state) &&
(I40E_DESC_UNUSED(tx_ring) != tx_ring->count))
tx_ring->arm_wb = true;
-   else
-   tx_ring->arm_wb = false;
 
netdev_tx_completed_queue(netdev_get_tx_queue(tx_ring->netdev,
  tx_ring->queue_index),
@@ -1288,6 +1286,7 @@ int i40evf_napi_poll(struct napi_struct *napi, int budget)
i40e_for_each_ring(ring, q_vector->tx) {
clean_complete &= i40e_clean_tx_irq(ring, vsi->work_limit);
arm_wb |= ring->arm_wb;
+   ring->arm_wb = false;
}
 
/* We attempt to distribute budget to each Rx queue fairly, but don't
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[net-next 11/17] i40e/i40evf: add new device id 1588

2015-09-30 Thread Jeff Kirsher

From: Shannon Nelson 

Add new device id and support for another 20Gb device.

Change-ID: Ib1b61e5bb6201d84953f97cade39a6e3369c2cf2
Signed-off-by: Shannon Nelson 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_common.c   | 1 +
 drivers/net/ethernet/intel/i40e/i40e_ethtool.c  | 1 +
 drivers/net/ethernet/intel/i40e/i40e_main.c | 2 ++
 drivers/net/ethernet/intel/i40e/i40e_type.h | 1 +
 drivers/net/ethernet/intel/i40evf/i40e_common.c | 1 +
 drivers/net/ethernet/intel/i40evf/i40e_type.h   | 1 +
 6 files changed, 7 insertions(+)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_common.c 
b/drivers/net/ethernet/intel/i40e/i40e_common.c
index 114dc64..80c354c 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_common.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_common.c
@@ -52,6 +52,7 @@ static i40e_status i40e_set_mac_type(struct i40e_hw *hw)
case I40E_DEV_ID_QSFP_C:
case I40E_DEV_ID_10G_BASE_T:
case I40E_DEV_ID_20G_KR2:
+   case I40E_DEV_ID_20G_KR2_A:
hw->mac.type = I40E_MAC_XL710;
break;
case I40E_DEV_ID_SFP_X722:
diff --git a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c 
b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
index e972b5e..dd2b620 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
@@ -437,6 +437,7 @@ static void i40e_get_settings_link_down(struct i40e_hw *hw,
ecmd->advertising |= ADVERTISED_100baseT_Full;
break;
case I40E_DEV_ID_20G_KR2:
+   case I40E_DEV_ID_20G_KR2_A:
/* backplane 20G */
ecmd->supported = SUPPORTED_2baseKR2_Full;
ecmd->advertising = ADVERTISED_2baseKR2_Full;
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 032df6d..8952ab7 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -79,6 +79,8 @@ static const struct pci_device_id i40e_pci_tbl[] = {
{PCI_VDEVICE(INTEL, I40E_DEV_ID_SFP_X722), 0},
{PCI_VDEVICE(INTEL, I40E_DEV_ID_1G_BASE_T_X722), 0},
{PCI_VDEVICE(INTEL, I40E_DEV_ID_10G_BASE_T_X722), 0},
+   {PCI_VDEVICE(INTEL, I40E_DEV_ID_20G_KR2), 0},
+   {PCI_VDEVICE(INTEL, I40E_DEV_ID_20G_KR2_A), 0},
/* required last entry */
{0, }
 };
diff --git a/drivers/net/ethernet/intel/i40e/i40e_type.h 
b/drivers/net/ethernet/intel/i40e/i40e_type.h
index af48290..c5b6a65 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_type.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_type.h
@@ -45,6 +45,7 @@
 #define I40E_DEV_ID_QSFP_C 0x1585
 #define I40E_DEV_ID_10G_BASE_T 0x1586
 #define I40E_DEV_ID_20G_KR20x1587
+#define I40E_DEV_ID_20G_KR2_A  0x1588
 #define I40E_DEV_ID_VF 0x154C
 #define I40E_DEV_ID_VF_HV  0x1571
 #define I40E_DEV_ID_SFP_X722   0x37D0
diff --git a/drivers/net/ethernet/intel/i40evf/i40e_common.c 
b/drivers/net/ethernet/intel/i40evf/i40e_common.c
index d45d0ae..1950db1 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_common.c
+++ b/drivers/net/ethernet/intel/i40evf/i40e_common.c
@@ -52,6 +52,7 @@ i40e_status i40e_set_mac_type(struct i40e_hw *hw)
case I40E_DEV_ID_QSFP_C:
case I40E_DEV_ID_10G_BASE_T:
case I40E_DEV_ID_20G_KR2:
+   case I40E_DEV_ID_20G_KR2_A:
hw->mac.type = I40E_MAC_XL710;
break;
case I40E_DEV_ID_SFP_X722:
diff --git a/drivers/net/ethernet/intel/i40evf/i40e_type.h 
b/drivers/net/ethernet/intel/i40evf/i40e_type.h
index ed71666..37bacc3 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_type.h
+++ b/drivers/net/ethernet/intel/i40evf/i40e_type.h
@@ -45,6 +45,7 @@
 #define I40E_DEV_ID_QSFP_C 0x1585
 #define I40E_DEV_ID_10G_BASE_T 0x1586
 #define I40E_DEV_ID_20G_KR20x1587
+#define I40E_DEV_ID_20G_KR2_A  0x1588
 #define I40E_DEV_ID_VF 0x154C
 #define I40E_DEV_ID_VF_HV  0x1571
 #define I40E_DEV_ID_SFP_X722   0x37D0
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[net-next 15/17] i40evf: tweak init timing

2015-09-30 Thread Jeff Kirsher

From: Mitch Williams 

This patch tweaks the init timing of the driver just a little bit to
increase stability on load/unload and SR-IOV enable/disable cycles.

First, run the init_task loop a little quicker in order to reduce
overall init time.

Second, stagger the start of the init task based on the device's
PCIe function ID. This lessens the impact on the firmware when a
whole bunch of VFs are initialized simultaneously, e.g. enabling
SR-IOV without the VF driver blacklisted. For single VFs assigned
to VMs this will have no effect as the function ID will always be 0.

Signed-off-by: Mitch Williams 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40evf/i40evf_main.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40evf/i40evf_main.c 
b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
index 49fa71c..76df6b2 100644
--- a/drivers/net/ethernet/intel/i40evf/i40evf_main.c
+++ b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
@@ -2300,8 +2300,7 @@ static void i40evf_init_task(struct work_struct *work)
}
return;
 restart:
-   schedule_delayed_work(&adapter->init_task,
- msecs_to_jiffies(50));
+   schedule_delayed_work(&adapter->init_task, msecs_to_jiffies(30));
return;
 
 err_register:
@@ -2434,7 +2433,8 @@ static int i40evf_probe(struct pci_dev *pdev, const 
struct pci_device_id *ent)
INIT_WORK(&adapter->adminq_task, i40evf_adminq_task);
INIT_WORK(&adapter->watchdog_task, i40evf_watchdog_task);
INIT_DELAYED_WORK(&adapter->init_task, i40evf_init_task);
-   schedule_delayed_work(&adapter->init_task, 10);
+   schedule_delayed_work(&adapter->init_task,
+ msecs_to_jiffies(5 * (pdev->devfn & 0x07)));
 
return 0;
 
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[net-next 17/17] i40e: fix 32 bit build warnings

2015-09-30 Thread Jeff Kirsher

From: Jesse Brandeburg 

Sparse found some issues with 32 bit compilation, which probably should
at least work without warning.  Not only that, but the code was wrong.
Thanks sparse!!

And thanks to the kbuild robot zero day testing for finding this issue.

$ make ARCH=i386 M=drivers/net/ethernet/intel/i40e C=2 CF="-D__CHECK_ENDIAN__"
  CHECK   drivers/net/ethernet/intel/i40e/i40e_main.c
  include/linux/etherdevice.h:79:32: warning: restricted __be16 degrades to 
integer
  drivers/net/ethernet/intel/i40e/i40e_main.c:7565:17: warning: shift too big 
(32) for type unsigned long
  drivers/net/ethernet/intel/i40e/i40e_main.c:7565:17: warning: shift too big 
(42) for type unsigned long
  drivers/net/ethernet/intel/i40e/i40e_main.c:7565:17: warning: shift too big 
(39) for type unsigned long
  drivers/net/ethernet/intel/i40e/i40e_main.c:7565:17: warning: shift too big 
(40) for type unsigned long

CC: kbuild-...@01.org
Signed-off-by: Jesse Brandeburg 
Reported-by: kbuild test robot 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_common.c |  5 -
 drivers/net/ethernet/intel/i40e/i40e_txrx.h   | 12 ++--
 2 files changed, 6 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_common.c 
b/drivers/net/ethernet/intel/i40e/i40e_common.c
index 80c354c..6833717 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_common.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_common.c
@@ -442,9 +442,6 @@ static i40e_status i40e_aq_get_set_rss_lut(struct i40e_hw 
*hw,
I40E_AQC_SET_RSS_LUT_TABLE_TYPE_SHIFT) &
I40E_AQC_SET_RSS_LUT_TABLE_TYPE_MASK));
 
-   cmd_resp->addr_high = cpu_to_le32(high_16_bits((u64)lut));
-   cmd_resp->addr_low = cpu_to_le32(lower_32_bits((u64)lut));
-
status = i40e_asq_send_command(hw, &desc, lut, lut_size, NULL);
 
return status;
@@ -519,8 +516,6 @@ static i40e_status i40e_aq_get_set_rss_key(struct i40e_hw 
*hw,
  I40E_AQC_SET_RSS_KEY_VSI_ID_SHIFT) &
  I40E_AQC_SET_RSS_KEY_VSI_ID_MASK));
cmd_resp->vsi_id |= cpu_to_le16((u16)I40E_AQC_SET_RSS_KEY_VSI_VALID);
-   cmd_resp->addr_high = cpu_to_le32(high_16_bits((u64)key));
-   cmd_resp->addr_low = cpu_to_le32(lower_32_bits((u64)key));
 
status = i40e_asq_send_command(hw, &desc, key, key_size, NULL);
 
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.h 
b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
index a3978c2..7c9975c 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
@@ -79,12 +79,12 @@ enum i40e_dyn_idx_t {
BIT_ULL(I40E_FILTER_PCTYPE_L2_PAYLOAD))
 
 #define I40E_DEFAULT_RSS_HENA_EXPANDED (I40E_DEFAULT_RSS_HENA | \
-   BIT(I40E_FILTER_PCTYPE_NONF_IPV4_TCP_SYN_NO_ACK) | \
-   BIT(I40E_FILTER_PCTYPE_NONF_UNICAST_IPV4_UDP) | \
-   BIT(I40E_FILTER_PCTYPE_NONF_MULTICAST_IPV4_UDP) | \
-   BIT(I40E_FILTER_PCTYPE_NONF_IPV6_TCP_SYN_NO_ACK) | \
-   BIT(I40E_FILTER_PCTYPE_NONF_UNICAST_IPV6_UDP) | \
-   BIT(I40E_FILTER_PCTYPE_NONF_MULTICAST_IPV6_UDP))
+   BIT_ULL(I40E_FILTER_PCTYPE_NONF_IPV4_TCP_SYN_NO_ACK) | \
+   BIT_ULL(I40E_FILTER_PCTYPE_NONF_UNICAST_IPV4_UDP) | \
+   BIT_ULL(I40E_FILTER_PCTYPE_NONF_MULTICAST_IPV4_UDP) | \
+   BIT_ULL(I40E_FILTER_PCTYPE_NONF_IPV6_TCP_SYN_NO_ACK) | \
+   BIT_ULL(I40E_FILTER_PCTYPE_NONF_UNICAST_IPV6_UDP) | \
+   BIT_ULL(I40E_FILTER_PCTYPE_NONF_MULTICAST_IPV6_UDP))
 
 #define i40e_pf_get_default_rss_hena(pf) \
(((pf)->flags & I40E_FLAG_MULTIPLE_TCP_UDP_RSS_PCTYPE) ? \
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[net-next 13/17] i40e: refactor interrupt enable

2015-09-30 Thread Jeff Kirsher

From: Jesse Brandeburg 

The interrupt enable function was always making the caller add
the base_vector from the VSI struct which is already passed to
the function. Just collapse the math into the helper function.

Change-ID: I54ef33aa7ceebc3231c3cc48f7b39fd0c3ff5806
Signed-off-by: Jesse Brandeburg 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 10 --
 drivers/net/ethernet/intel/i40e/i40e_txrx.c |  6 ++
 2 files changed, 6 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 43e21bd..c9a7dfa 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -3066,7 +3066,7 @@ void i40e_irq_dynamic_enable_icr0(struct i40e_pf *pf)
 /**
  * i40e_irq_dynamic_enable - Enable default interrupt generation settings
  * @vsi: pointer to a vsi
- * @vector: enable a particular Hw Interrupt vector
+ * @vector: enable a particular Hw Interrupt vector, without base_vector
  **/
 void i40e_irq_dynamic_enable(struct i40e_vsi *vsi, int vector)
 {
@@ -3077,7 +3077,7 @@ void i40e_irq_dynamic_enable(struct i40e_vsi *vsi, int 
vector)
val = I40E_PFINT_DYN_CTLN_INTENA_MASK |
  I40E_PFINT_DYN_CTLN_CLEARPBA_MASK |
  (I40E_ITR_NONE << I40E_PFINT_DYN_CTLN_ITR_INDX_SHIFT);
-   wr32(hw, I40E_PFINT_DYN_CTLN(vector - 1), val);
+   wr32(hw, I40E_PFINT_DYN_CTLN(vector + vsi->base_vector - 1), val);
/* skip the flush */
 }
 
@@ -3220,8 +3220,7 @@ static int i40e_vsi_enable_irq(struct i40e_vsi *vsi)
int i;
 
if (pf->flags & I40E_FLAG_MSIX_ENABLED) {
-   for (i = vsi->base_vector;
-i < (vsi->num_q_vectors + vsi->base_vector); i++)
+   for (i = 0; i < vsi->num_q_vectors; i++)
i40e_irq_dynamic_enable(vsi, i);
} else {
i40e_irq_dynamic_enable_icr0(pf);
@@ -3453,8 +3452,7 @@ static bool i40e_clean_fdir_tx_irq(struct i40e_ring 
*tx_ring, int budget)
tx_ring->next_to_clean = i;
 
if (vsi->back->flags & I40E_FLAG_MSIX_ENABLED) {
-   i40e_irq_dynamic_enable(vsi,
-   tx_ring->q_vector->v_idx + vsi->base_vector);
+   i40e_irq_dynamic_enable(vsi, tx_ring->q_vector->v_idx);
}
return budget > 0;
 }
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c 
b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index 47dba9b0..5e1a7dc 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -1783,8 +1783,7 @@ static inline void i40e_update_enable_itr(struct i40e_vsi 
*vsi,
if (!test_bit(__I40E_DOWN, &vsi->state))
wr32(hw, I40E_PFINT_DYN_CTLN(vector - 1), val);
} else {
-   i40e_irq_dynamic_enable(vsi,
-   q_vector->v_idx + vsi->base_vector);
+   i40e_irq_dynamic_enable(vsi, q_vector->v_idx);
}
if (ITR_IS_DYNAMIC(vsi->tx_itr_setting)) {
old_itr = q_vector->tx.itr;
@@ -1806,8 +1805,7 @@ static inline void i40e_update_enable_itr(struct i40e_vsi 
*vsi,
wr32(hw, I40E_PFINT_DYN_CTLN(q_vector->v_idx +
  vsi->base_vector - 1), val);
} else {
-   i40e_irq_dynamic_enable(vsi,
-   q_vector->v_idx + vsi->base_vector);
+   i40e_irq_dynamic_enable(vsi, q_vector->v_idx);
}
 }
 
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[net-next 04/17] i40e: Fix a port VLAN configuration bug

2015-09-30 Thread Jeff Kirsher

From: Greg Rose 

If a port VLAN is set for a given virtual function (VF) before the VF
driver is loaded then a configuration error results in which the port
VLAN is ignored when the VF driver is subsequently loaded.  This causes
the VF's MAC/VLAN filters to not use the correct VLAN filter.  This
patch ensures that the port VLAN filter is considered at the right time
during configuration of the VF's MAC/VLAN filters.

Change-ID: I28f404cbc21a4c6d70a7980b87c77f13f06685a4
Signed-off-by: Greg Rose 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 613da51..dce7d85 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -1269,7 +1269,7 @@ bool i40e_is_vsi_in_vlan(struct i40e_vsi *vsi)
 * so we have to go through all the list in order to make sure
 */
list_for_each_entry(f, &vsi->mac_filter_list, list) {
-   if (f->vlan >= 0)
+   if (f->vlan >= 0 || vsi->info.pvid)
return true;
}
 
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[net-next 00/17][pull request] Intel Wired LAN Driver Updates 2015-09-30

2015-09-30 Thread Jeff Kirsher

This series contains updates to i40e and i40evf only.

Vasily Averin provides a couple of rtnl lock/unlock fixes for both i40e
and i40evf.

Shannon provides several updates and fixes, first fixes up a type clash
in i40e_aq_rc_to_posix(), where the error codes are signed values, so we
need to treat them as such.  Then fixes up a padding issue where an
extra byte is added in i40e_aqc_get_cee_dcb_cfg_v1_resp to directly
acknowledge the padding.  Updated i40e to keep debugfs register read
and writes from accessing outside of the io-remapped space.  Added
support and device id for another 20 GbE device.

Jesse fixes the transmit hand workaround code for ARM that was causing
Tx hangs to still occur occasionally when there really was no hang.  Then
fixed the receive dropped counter to show up in netstat interface.
Refactor the interrupt enable function since it was always making the
caller add the base_vector from the VSI struct which is already passed
to the function.  Fix kbuild warnings found in 0day build infrastructure
by adding a harmless cast to a dev_info(), also fix 32 bit build
warnings found by sparse.

Greg fixed a configuration error that results if a port VLAN is set
for a VF before the VF driver is loaded, so that when the VF driver is
loaded the port VLAN is ignored.

Mitch fixes the use of QOS field consistently in
i40e_ndo_set_vf_port_vlan().  Modified the init timing of the driver
to increase stability on load/unload and SR-IOV enable/disable cycles.

Anjali updates i40e to not collect VEB stats if they are disabled in the
hardware for performance reasons.

The following are changes since commit 4bf1b54f9df7ced4869f7dfd0bdf5eb22aa98447:
  Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
and are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue master

Anjali Singhai Jain (1):
  i40e: Strip VEB stats if they are disabled in HW

Greg Rose (2):
  i40e: Fix a port VLAN configuration bug
  i40e: Remove useless message

Jesse Brandeburg (6):
  i40e/i40evf: fix Tx hang workaround code
  i40e: count drops in netstat interface
  i40e: refactor interrupt enable
  i40e: warn on double free
  i40e: fix kbuild warnings
  i40e: fix 32 bit build warnings

Mitch Williams (2):
  i40e: use QOS field consistently
  i40evf: tweak init timing

Shannon Nelson (4):
  i40e/i40evf: fix up type clash in i40e_aq_rc_to_posix conversion
  i40e: fixup padding issue in get_cee_dcb_cfg_v1_resp
  i40e: limit debugfs io ops
  i40e/i40evf: add new device id 1588

Vasily Averin (2):
  i40evf: missing rtnl_unlock in i40evf_resume()
  i40e: rtnl_lock called twice in i40e_pci_error_resume()

 drivers/net/ethernet/intel/i40e/i40e.h |  4 ++-
 drivers/net/ethernet/intel/i40e/i40e_adminq.h  |  9 +++--
 drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h  | 13 +--
 drivers/net/ethernet/intel/i40e/i40e_common.c  |  6 +---
 drivers/net/ethernet/intel/i40e/i40e_debugfs.c | 12 +++
 drivers/net/ethernet/intel/i40e/i40e_ethtool.c | 10 --
 drivers/net/ethernet/intel/i40e/i40e_main.c| 42 +++---
 drivers/net/ethernet/intel/i40e/i40e_txrx.c| 11 +++---
 drivers/net/ethernet/intel/i40e/i40e_txrx.h| 12 +++
 drivers/net/ethernet/intel/i40e/i40e_type.h|  1 +
 drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c |  9 +++--
 drivers/net/ethernet/intel/i40evf/i40e_adminq.h|  9 +++--
 drivers/net/ethernet/intel/i40evf/i40e_common.c|  1 +
 drivers/net/ethernet/intel/i40evf/i40e_txrx.c  |  3 +-
 drivers/net/ethernet/intel/i40evf/i40e_type.h  |  1 +
 drivers/net/ethernet/intel/i40evf/i40evf_main.c|  7 ++--
 16 files changed, 87 insertions(+), 63 deletions(-)

-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[net-next 03/17] i40e/i40evf: fix up type clash in i40e_aq_rc_to_posix conversion

2015-09-30 Thread Jeff Kirsher

From: Shannon Nelson 

The error code sent into i40e_aq_rc_to_posix() are signed values, so we
really need to treat them as such.

Change-ID: I3d1ae0ee9ae0b1b6f5fc424f8b8cc58b0ea93203
Reported-by: Helin Zhang 
Signed-off-by: Shannon Nelson 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_adminq.h   | 9 ++---
 drivers/net/ethernet/intel/i40evf/i40e_adminq.h | 9 ++---
 2 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_adminq.h 
b/drivers/net/ethernet/intel/i40e/i40e_adminq.h
index b67b34c..ca81b0b 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_adminq.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_adminq.h
@@ -109,9 +109,10 @@ struct i40e_adminq_info {
 
 /**
  * i40e_aq_rc_to_posix - convert errors to user-land codes
- * aq_rc: AdminQ error code to convert
+ * aq_ret: AdminQ handler error code can override aq_rc
+ * aq_rc: AdminQ firmware error code to convert
  **/
-static inline int i40e_aq_rc_to_posix(u32 aq_ret, u16 aq_rc)
+static inline int i40e_aq_rc_to_posix(int aq_ret, int aq_rc)
 {
int aq_to_posix[] = {
0,   /* I40E_AQ_RC_OK */
@@ -143,8 +144,10 @@ static inline int i40e_aq_rc_to_posix(u32 aq_ret, u16 
aq_rc)
if (aq_ret == I40E_ERR_ADMIN_QUEUE_TIMEOUT)
return -EAGAIN;
 
-   if (aq_rc >= ARRAY_SIZE(aq_to_posix))
+   if (aq_rc >= (sizeof(aq_to_posix) / sizeof((aq_to_posix)[0])) ||
+   aq_rc < 0)
return -ERANGE;
+
return aq_to_posix[aq_rc];
 }
 
diff --git a/drivers/net/ethernet/intel/i40evf/i40e_adminq.h 
b/drivers/net/ethernet/intel/i40evf/i40e_adminq.h
index 547b79b..e62e951 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_adminq.h
+++ b/drivers/net/ethernet/intel/i40evf/i40e_adminq.h
@@ -109,9 +109,10 @@ struct i40e_adminq_info {
 
 /**
  * i40e_aq_rc_to_posix - convert errors to user-land codes
- * aq_rc: AdminQ error code to convert
+ * aq_ret: AdminQ handler error code can override aq_rc
+ * aq_rc: AdminQ firmware error code to convert
  **/
-static inline int i40e_aq_rc_to_posix(u32 aq_ret, u16 aq_rc)
+static inline int i40e_aq_rc_to_posix(int aq_ret, int aq_rc)
 {
int aq_to_posix[] = {
0,   /* I40E_AQ_RC_OK */
@@ -143,8 +144,10 @@ static inline int i40e_aq_rc_to_posix(u32 aq_ret, u16 
aq_rc)
if (aq_ret == I40E_ERR_ADMIN_QUEUE_TIMEOUT)
return -EAGAIN;
 
-   if (aq_rc >= ARRAY_SIZE(aq_to_posix))
+   if (aq_rc >= (sizeof(aq_to_posix) / sizeof((aq_to_posix)[0])) ||
+   aq_rc < 0)
return -ERANGE;
+
return aq_to_posix[aq_rc];
 }
 
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[net-next 02/17] i40e: rtnl_lock called twice in i40e_pci_error_resume()

2015-09-30 Thread Jeff Kirsher

From: Vasily Averin 

Signed-off-by: Vasily Averin 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 52e58f30..613da51 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -10537,7 +10537,7 @@ static void i40e_pci_error_resume(struct pci_dev *pdev)
 
rtnl_lock();
i40e_handle_reset_warning(pf);
-   rtnl_lock();
+   rtnl_unlock();
 }
 
 /**
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[net-next 14/17] i40e: warn on double free

2015-09-30 Thread Jeff Kirsher

From: Jesse Brandeburg 

Down was requesting queue disables, but then exited immediately without
waiting for the queues to actually disable. This could allow any
function called after i40evf_down to run immediately, including
i40evf_up, and causes a memory leak.

This issue has been fixed in a recent refactor of the reset code, but
add a couple WARN_ONs in the slow path to help us recognize if we
reintroduce this issue or if we missed any cases.

Change-ID: I27b6b5c9a79c1892f0ba453129f116bc32647dd0
Signed-off-by: Jesse Brandeburg 
Signed-off-by: Mitch Williams 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_txrx.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c 
b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index 5e1a7dc..0d692dd 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -923,6 +923,8 @@ int i40e_setup_tx_descriptors(struct i40e_ring *tx_ring)
if (!dev)
return -ENOMEM;
 
+   /* warn if we are about to overwrite the pointer */
+   WARN_ON(tx_ring->tx_bi);
bi_size = sizeof(struct i40e_tx_buffer) * tx_ring->count;
tx_ring->tx_bi = kzalloc(bi_size, GFP_KERNEL);
if (!tx_ring->tx_bi)
@@ -1083,6 +1085,8 @@ int i40e_setup_rx_descriptors(struct i40e_ring *rx_ring)
struct device *dev = rx_ring->dev;
int bi_size;
 
+   /* warn if we are about to overwrite the pointer */
+   WARN_ON(rx_ring->rx_bi);
bi_size = sizeof(struct i40e_rx_buffer) * rx_ring->count;
rx_ring->rx_bi = kzalloc(bi_size, GFP_KERNEL);
if (!rx_ring->rx_bi)
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net] ipv4: add support for "gratuitous" redirect

2015-09-30 Thread Paolo Abeni

On Tue, 2015-09-29 at 11:49 -0700, David Miller wrote:
> I don't know if it's a question of terminology but RFC1122 does
> seem to ask us to make this check, from 3.2.2.2:
> 
> A Redirect message SHOULD be silently discarded if ... the
> source of the Redirect is not the current first-hop gateway
> for the specified destination (see Section 3.3.1).

According to the same RFC:

*"SHOULD"

  This word or the adjective "RECOMMENDED" means that there
  may exist valid reasons in particular circumstances to
  ignore this item, but the full implications should be
  understood and the case carefully weighed before choosing
  a different course. 

So, if we agree that there are valid reasons we can make this check
optional.

> Whereas if we changed the routers to send ICMP redirects that these
> hosts would actually accept, you'd only have to make the change on
> the routers.  This is several orders of magnitude easier to deploy.

Actually fixing this behavior on the VRRP router requires somewhat
'NATing' ICMP redirect packets, which AFAIK is not possible on a Linux
box, since ICMP reply packets do not traverse the nat table.

To cope with the original issue on the router, it would be needed an
'icmp_errors_use_inbound_daddr' sysctl, that, when enabled, forces the
kernel to send icmp error messages using the destination address of the
incoming packet as source. 

Eventually we can restrict the above option to icmp redirect only.

Do you think that adding such configuration would be more acceptable?

Some time ago, there was already some discussion on this topic:

http://comments.gmane.org/gmane.linux.kernel/1128505

Thank you,

Paolo

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC v2] ipv4: avoid timespec in timestamp computation

2015-09-30 Thread Arnd Bergmann

This is an attempt to avoid the use of timespec in ipv4, where
getnstimeofday() used to be used for computing the number of
milliseconds since midnight, in three places.

That computation would overflow in 2038 on 32-bit machines,
and the normal workaround for this is to use timespec64, which
in turn requires an expensive div_s64_mod() function call
for calculating the seconds modulo 86400.

Instead, this approach introduces a new generic helper function
that does this more efficiently, by using only a 32-bit modulo
(which the compiler can turn into two multiplications), relying
on 39 bits to be sufficient for the current time of day. This
is roughly 100 times faster than a full divmod operation on ARM.

As a further optimization, this does not use the exact nanosecond
value but instead relies tk_xtime() to report the time of the
last jiffy, which is slightly less accurate, depending on the
value of HZ.

Signed-off-by: Arnd Bergmann 
Cc: Alexey Kuznetsov 
Cc: James Morris 
Cc: Hideaki YOSHIFUJI 
Cc: Patrick McHardy 
Cc: John Stultz 
Cc: Thomas Gleixner 
---
v2: fixed build error reported by l...@intel.com

diff --git a/include/linux/timekeeping.h b/include/linux/timekeeping.h
index ec89d846324c..db8fc5171294 100644
--- a/include/linux/timekeeping.h
+++ b/include/linux/timekeeping.h
@@ -38,6 +38,8 @@ extern void ktime_get_ts64(struct timespec64 *ts);
 extern time64_t ktime_get_seconds(void);
 extern time64_t ktime_get_real_seconds(void);
 
+extern u32 ktime_get_ms_since_midnight(void);
+
 extern int __getnstimeofday64(struct timespec64 *tv);
 extern void getnstimeofday64(struct timespec64 *tv);
 extern void getboottime64(struct timespec64 *ts);
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 274ed5e88456..44ecd7058946 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -846,6 +846,40 @@ time64_t ktime_get_real_seconds(void)
 }
 EXPORT_SYMBOL_GPL(ktime_get_real_seconds);
 
+#if IS_ENABLED(CONFIG_INET)
+u32 ktime_get_ms_since_midnight(void)
+{
+   struct timekeeper *tk = &tk_core.timekeeper;
+   struct timespec64 now;
+   unsigned long seq;
+   u32 ms;
+
+   /* we assume that the coarse time is good enough here */
+   do {
+   seq = read_seqcount_begin(&tk_core.seq);
+
+   now = tk_xtime(tk);
+   } while (read_seqcount_retry(&tk_core.seq, seq));
+
+   /*
+* efficiently calculate the milliseconds since midnight:
+* 86400 seconds per day == 2^7 * 675, which helps us
+* replace an expensive div_s64_rem() with a hand-written
+* 39-bit modulo on 32-bit architectures.
+*/
+   if (!IS_ENABLED(CONFIG_64BIT))
+   ms = (now.tv_sec & 0x7f) * MSEC_PER_SEC +
+((u32)(now.tv_sec >> 7) % 675) * 0x80 * MSEC_PER_SEC;
+   else
+   ms = (now.tv_sec % 86400) * MSEC_PER_SEC;
+
+   ms += now.tv_nsec / NSEC_PER_MSEC;
+
+   return ms;
+}
+EXPORT_SYMBOL_GPL(ktime_get_ms_since_midnight);
+#endif
+
 #ifdef CONFIG_NTP_PPS
 
 /**
diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index e5eb8ac4089d..b1c53f2f7bd5 100644
--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -914,7 +914,6 @@ static bool icmp_echo(struct sk_buff *skb)
  */
 static bool icmp_timestamp(struct sk_buff *skb)
 {
-   struct timespec tv;
struct icmp_bxm icmp_param;
/*
 *  Too short.
@@ -923,11 +922,10 @@ static bool icmp_timestamp(struct sk_buff *skb)
goto out_err;
 
/*
-*  Fill in the current time as ms since midnight UT:
+*  Fill in the current time as ms since midnight UT,
+*  this could probably be done faster.
 */
-   getnstimeofday(&tv);
-   icmp_param.data.times[1] = htonl((tv.tv_sec % 86400) * MSEC_PER_SEC +
-tv.tv_nsec / NSEC_PER_MSEC);
+   icmp_param.data.times[1] = htonl(ktime_get_ms_since_midnight());
icmp_param.data.times[2] = icmp_param.data.times[1];
if (skb_copy_bits(skb, 0, &icmp_param.data.times[0], 4))
BUG();
diff --git a/net/ipv4/ip_options.c b/net/ipv4/ip_options.c
index bd246792360b..339ce528ecae 100644
--- a/net/ipv4/ip_options.c
+++ b/net/ipv4/ip_options.c
@@ -58,10 +58,8 @@ void ip_options_build(struct sk_buff *skb, struct ip_options 
*opt,
if (opt->ts_needaddr)
ip_rt_get_source(iph+opt->ts+iph[opt->ts+2]-9, skb, rt);
if (opt->ts_needtime) {
-   struct timespec tv;
__be32 midtime;
-   getnstimeofday(&tv);
-   midtime = htonl((tv.tv_sec % 86400) * MSEC_PER_SEC + 
tv.tv_nsec / NSEC_PER_MSEC);
+   midtime = htonl(ktime_get_ms_since_midnight());
memcpy(iph+opt->ts+iph[opt->ts+2]-5, &midtime, 4);
}
return;
@@ -415,10 +413,7 @@ int ip_options_compile(struct net *net,

[RFC v3 net-next 00/10] Add new drivers: qed & qede

2015-09-30 Thread Yuval Mintz

From: Ariel Elior 

This series implements the driver set for Qlogic's new 579xx series.
These are 10/20/25/40/50/100 Gig capable converged nics, supporting
ethernet (obviously), iscsi, fcoe, roce and iwarp protocols.

The overall driver design includes a common module ('qed') and protocol
specific dependent modules for ethernet ('qede'), fcoe ('qedf'),
iscsi ('qedi') and roce ('qedr').
The common module contains all of the common logic, e.g. initialization,
cleanup, infrastructure for interrupt handling, link management, slowpath
etc. as well as protocol agnostic features, and supplying an abstraction
layer for other modules.
The protocol specific modules can be compiled and operated independently
of each other, with the exception of the rdma modules which are dependent
on the ethernet module, in accordance with the kernel rdma stack design.

This series only adds the core and ethernet modules, with basic L2
capabilities. Future series will add the rest of the modules and enhance
the L2 functionality.

Ths patch series is constructed of the following patches:
qed:  Add module with basic common support
qed:  Add basic L2 interface
qede: Add basic Network driver
qed:  Add slowpath L2 support
qede: Add basic network device support
qede: Add classification configuration
qed:  Add link support
qede: Add support for link
qed:  Add statistics support
qede: Add basic ethtool support

We don't expect the series to be accepted as is. We are looking for
upstream community feedback and guidance. Although the series is quite
large, it is what we viewed as the minimal set of patches to constitute
a basic L2 driver.

This project is a team effort, thanks go to Yuval Mintz, Dmitry Kravkov,
Michal Kalderon, Tomer Tayar, Manish Chopra, Sudarsana Kalluru,
Rajesh Borundia, Sony Chacko, Artum Zolotushko, Harish Patil, Rasesh Mody,
Sergey Ukhterov and Elad Manela, as well as former team members,
Eilon Greenstein and Shmulik Ravid.

Changes from previos version:
-

>From Version 2:
  - Removed U64_{HI,LO}; Using {upper,lower}_32_bits instead.
  - Use regular napi weight definition.
  - [We still use the __le variants for variables, since we didn't get
 a reply regarding the change into non-user API types].

>From Version 1:
  - Removed private license file; Instead revised comments at source headers.

Thanks,
Ariel Elior
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC v3 net-next 07/10] qed: Add link support

2015-09-30 Thread Yuval Mintz

Physical link is handled by the management Firmware.
This patch lays the infrastructure for attention handling in the driver,
as link change notifications arrive via async. attentions,
as well the handling of such notifications.

This patch also extends the API with the protocol drivers by adding registered
callbacks which the protocol driver passes to qed in order to be notified
of async. events originating from the FW/HW.

Signed-off-by: Yuval Mintz 
Signed-off-by: Ariel Elior 
---
 drivers/net/ethernet/qlogic/qed/qed.h  |  20 ++
 drivers/net/ethernet/qlogic/qed/qed_dev.c  | 106 -
 drivers/net/ethernet/qlogic/qed/qed_int.c  | 340 -
 drivers/net/ethernet/qlogic/qed/qed_l2.c   |   9 +
 drivers/net/ethernet/qlogic/qed/qed_main.c | 212 ++
 drivers/net/ethernet/qlogic/qed/qed_mcp.c  | 300 +
 drivers/net/ethernet/qlogic/qed/qed_mcp.h  | 126 ++-
 include/linux/qed/qed_eth_if.h |   4 +
 8 files changed, 1112 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qed/qed.h 
b/drivers/net/ethernet/qlogic/qed/qed.h
index 7373928..4e21d79 100644
--- a/drivers/net/ethernet/qlogic/qed/qed.h
+++ b/drivers/net/ethernet/qlogic/qed/qed.h
@@ -108,6 +108,18 @@ enum QED_FEATURE {
QED_MAX_FEATURES,
 };
 
+enum QED_PORT_MODE {
+   QED_PORT_MODE_DE_2X40G,
+   QED_PORT_MODE_DE_2X50G,
+   QED_PORT_MODE_DE_1X100G,
+   QED_PORT_MODE_DE_4X10G_F,
+   QED_PORT_MODE_DE_4X10G_E,
+   QED_PORT_MODE_DE_4X20G,
+   QED_PORT_MODE_DE_1X40G,
+   QED_PORT_MODE_DE_2X25G,
+   QED_PORT_MODE_DE_1X25G
+};
+
 struct qed_hw_info {
/* PCI personality */
enum qed_pci_personalitypersonality;
@@ -404,6 +416,13 @@ struct qed_dev {
u8  protocol;
 #define IS_QED_ETH_IF(cdev) ((cdev)->protocol == QED_PROTOCOL_ETH)
 
+   /* Callbacks to protocol driver */
+   union {
+   struct qed_common_cb_ops*common;
+   struct qed_eth_cb_ops   *eth;
+   } protocol_ops;
+   void*ops_cookie;
+
const struct firmware   *firmware;
 };
 
@@ -453,6 +472,7 @@ static inline u8 qed_concrete_to_sw_fid(struct qed_dev 
*cdev,
 /* Prototypes */
 int qed_fill_dev_info(struct qed_dev   *cdev,
  struct qed_dev_info   *dev_info);
+void qed_link_update(struct qed_hwfn *hwfn);
 u32 qed_unzip_data(struct qed_hwfn *p_hwfn,
   u32 input_len, u8 *input_buf,
   u32 max_size, u8 *unzip_buf);
diff --git a/drivers/net/ethernet/qlogic/qed/qed_dev.c 
b/drivers/net/ethernet/qlogic/qed/qed_dev.c
index 30408b7..cde72e2 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_dev.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_dev.c
@@ -1040,8 +1040,9 @@ static void qed_hw_get_resc(struct qed_hwfn *p_hwfn)
 static int qed_hw_get_nvm_info(struct qed_hwfn *p_hwfn,
   struct qed_ptt   *p_ptt)
 {
-   u32 nvm_cfg1_offset, mf_mode, addr, generic_cont0;
-   u32 val;
+   u32 nvm_cfg1_offset, mf_mode, addr, generic_cont0, core_cfg;
+   struct qed_mcp_link_params *link;
+   u32 port_cfg_addr, link_temp, val;
 
/* Read global nvm_cfg address */
u32 nvm_cfg_addr = qed_rd(p_hwfn, p_ptt, MISC_REG_GEN_PURP_CR0);
@@ -1061,6 +1062,48 @@ static int qed_hw_get_nvm_info(struct qed_hwfn   *p_hwfn,
   offsetof(struct nvm_cfg1_glob, pci_id);
p_hwfn->hw_info.vendor_id = qed_rd(p_hwfn, p_ptt, addr) &
NVM_CFG1_GLOB_VENDOR_ID_MASK;
+
+   addr = MCP_REG_SCRATCH + nvm_cfg1_offset +
+  offsetof(struct nvm_cfg1, glob) +
+  offsetof(struct nvm_cfg1_glob, core_cfg);
+
+   core_cfg = qed_rd(p_hwfn, p_ptt, addr);
+
+   switch ((core_cfg & NVM_CFG1_GLOB_NETWORK_PORT_MODE_MASK) >>
+   NVM_CFG1_GLOB_NETWORK_PORT_MODE_OFFSET) {
+   case NVM_CFG1_GLOB_NETWORK_PORT_MODE_DE_2X40G:
+   p_hwfn->hw_info.port_mode = QED_PORT_MODE_DE_2X40G;
+   break;
+   case NVM_CFG1_GLOB_NETWORK_PORT_MODE_DE_2X50G:
+   p_hwfn->hw_info.port_mode = QED_PORT_MODE_DE_2X50G;
+   break;
+   case NVM_CFG1_GLOB_NETWORK_PORT_MODE_DE_1X100G:
+   p_hwfn->hw_info.port_mode = QED_PORT_MODE_DE_1X100G;
+   break;
+   case NVM_CFG1_GLOB_NETWORK_PORT_MODE_DE_4X10G_F:
+   p_hwfn->hw_info.port_mode = QED_PORT_MODE_DE_4X10G_F;
+   break;
+   case NVM_CFG1_GLOB_NETWORK_PORT_MODE_DE_4X10G_E:
+   p_hwfn->hw_info.port_mode = QED_PORT_MODE_DE_4X10G_E;
+   break;
+   case NVM_CFG1_GLOB_NETWORK_PORT_MODE_DE_4X20G:
+   p_hwfn->hw_info.port_mode = QED_PORT_MODE_DE_4X20G;
+   break;
+   case NVM_CFG1_GLOB_NETWORK_PORT_MODE_DE_1X40G:
+   p_hwfn->hw_info.port_mode = QED_PORT_MODE_DE_

[RFC v3 net-next 06/10] qede: classification configuration

2015-09-30 Thread Yuval Mintz

From: Sudarsana Kalluru 

Add the ability to configure basic classification in driver by
implementing ndo_set_mac_address() and ndo_set_rx_mode().

Signed-off-by: Sudarsana Kalluru 
Signed-off-by: Yuval Mintz 
Signed-off-by: Ariel Elior 
---
 drivers/net/ethernet/qlogic/qede/qede.h  |  10 ++
 drivers/net/ethernet/qlogic/qede/qede_main.c | 241 +++
 2 files changed, 251 insertions(+)

diff --git a/drivers/net/ethernet/qlogic/qede/qede.h 
b/drivers/net/ethernet/qlogic/qede/qede.h
index 424ef4a..7947942 100644
--- a/drivers/net/ethernet/qlogic/qede/qede.h
+++ b/drivers/net/ethernet/qlogic/qede/qede.h
@@ -87,6 +87,9 @@ struct qede_dev {
struct qed_update_vport_rss_params  rss_params;
u16 q_num_rx_buffers; /* Must be a power of two */
u16 q_num_tx_buffers; /* Must be a power of two */
+
+   struct delayed_work sp_task;
+   unsigned long   sp_flags;
 };
 
 enum QEDE_STATE {
@@ -184,6 +187,13 @@ struct qede_fastpath {
 
 #define QEDE_CSUM_ERRORBIT(0)
 #define QEDE_CSUM_UNNECESSARY  BIT(1)
+
+#define QEDE_SP_RX_MODE1
+
+union qede_reload_args {
+   u16 mtu;
+};
+
 #define RX_RING_SIZE_POW   13
 #define RX_RING_SIZE   BIT(RX_RING_SIZE_POW)
 #define NUM_RX_BDS_MAX (RX_RING_SIZE - 1)
diff --git a/drivers/net/ethernet/qlogic/qede/qede_main.c 
b/drivers/net/ethernet/qlogic/qede/qede_main.c
index 2894c8b..7e88365 100644
--- a/drivers/net/ethernet/qlogic/qede/qede_main.c
+++ b/drivers/net/ethernet/qlogic/qede/qede_main.c
@@ -1030,10 +1030,31 @@ static irqreturn_t qede_msix_fp_int(int irq, void 
*fp_cookie)
 
 static int qede_open(struct net_device *ndev);
 static int qede_close(struct net_device *ndev);
+static int qede_set_mac_addr(struct net_device *ndev, void *p);
+static void qede_set_rx_mode(struct net_device *ndev);
+static void qede_config_rx_mode(struct net_device *ndev);
+
+static int qede_set_ucast_rx_mac(struct qede_dev *edev,
+enum qed_filter_xcast_params_type opcode,
+unsigned char mac[ETH_ALEN])
+{
+   struct qed_filter_params filter_cmd;
+
+   memset(&filter_cmd, 0, sizeof(filter_cmd));
+   filter_cmd.type = QED_FILTER_TYPE_UCAST;
+   filter_cmd.filter.ucast.type = opcode;
+   filter_cmd.filter.ucast.mac_valid = 1;
+   ether_addr_copy(filter_cmd.filter.ucast.mac, mac);
+
+   return edev->ops->filter_config(edev->cdev, &filter_cmd);
+}
+
 static const struct net_device_ops qede_netdev_ops = {
.ndo_open = qede_open,
.ndo_stop = qede_close,
.ndo_start_xmit = qede_start_xmit,
+   .ndo_set_rx_mode = qede_set_rx_mode,
+   .ndo_set_mac_address = qede_set_mac_addr,
.ndo_validate_addr = eth_validate_addr,
 };
 
@@ -1198,6 +1219,20 @@ err:
return -ENOMEM;
 }
 
+static void qede_sp_task(struct work_struct *work)
+{
+   struct qede_dev *edev = container_of(work, struct qede_dev,
+sp_task.work);
+   mutex_lock(&edev->qede_lock);
+
+   if (edev->state == QEDE_STATE_OPEN) {
+   if (test_and_clear_bit(QEDE_SP_RX_MODE, &edev->sp_flags))
+   qede_config_rx_mode(edev->ndev);
+   }
+
+   mutex_unlock(&edev->qede_lock);
+}
+
 static void qede_update_pf_params(struct qed_dev *cdev)
 {
struct qed_pf_params pf_params;
@@ -1269,6 +1304,9 @@ static int __qede_probe(struct pci_dev *pdev, u32 
dp_module, u8 dp_level,
 
edev->ops->common->set_id(cdev, edev->ndev->name, DRV_MODULE_VERSION);
 
+   INIT_DELAYED_WORK(&edev->sp_task, qede_sp_task);
+   mutex_init(&edev->qede_lock);
+
DP_INFO(edev, "Ending successfully qede probe\n");
 
return 0;
@@ -1306,6 +1344,7 @@ static void __qede_remove(struct pci_dev *pdev, enum 
qede_remove_mode mode)
 
DP_INFO(edev, "Starting qede_remove\n");
 
+   cancel_delayed_work_sync(&edev->sp_task);
unregister_netdev(ndev);
 
edev->ops->common->set_power_state(cdev, PCI_D0);
@@ -2025,6 +2064,24 @@ static int qede_start_queues(struct qede_dev *edev)
return 0;
 }
 
+static int qede_set_mcast_rx_mac(struct qede_dev *edev,
+enum qed_filter_xcast_params_type opcode,
+unsigned char *mac, int num_macs)
+{
+   struct qed_filter_params filter_cmd;
+   int i;
+
+   memset(&filter_cmd, 0, sizeof(filter_cmd));
+   filter_cmd.type = QED_FILTER_TYPE_MCAST;
+   filter_cmd.filter.mcast.type = opcode;
+   filter_cmd.filter.mcast.num = num_macs;
+
+   for (i = 0; i < num_macs; i++, mac += ETH_ALEN)
+   ether_addr_copy(filter_cmd.filter.mcast.mac[i], mac);
+
+   return edev->ops->filter_config(edev->cdev, &filter_cmd);
+}
+
 enum qede_unload_mode {
QEDE_UNLOAD_NORMAL,
 };
@@ -2035,6 +2092,9 @@ static void q

[RFC v3 net-next 03/10] qede: Add basic Network driver

2015-09-30 Thread Yuval Mintz

The Qlogic Everest Driver for Ethernet is the Ethernet specifc module for
579xx ethernet products by Qlogic.

This patch adds a very minimal PCI driver, one that doesn't yet register
a network device, but one that does interact with qed and does a basic
initialization of the HW.

Signed-off-by: Yuval Mintz 
Signed-off-by: Ariel Elior 
---
 drivers/net/ethernet/qlogic/Kconfig  |   5 +
 drivers/net/ethernet/qlogic/Makefile |   1 +
 drivers/net/ethernet/qlogic/qede/Makefile|   3 +
 drivers/net/ethernet/qlogic/qede/qede.h  |  73 ++
 drivers/net/ethernet/qlogic/qede/qede_main.c | 354 +++
 5 files changed, 436 insertions(+)
 create mode 100644 drivers/net/ethernet/qlogic/qede/Makefile
 create mode 100644 drivers/net/ethernet/qlogic/qede/qede.h
 create mode 100644 drivers/net/ethernet/qlogic/qede/qede_main.c

diff --git a/drivers/net/ethernet/qlogic/Kconfig 
b/drivers/net/ethernet/qlogic/Kconfig
index 58c3fb3..30a6f24 100644
--- a/drivers/net/ethernet/qlogic/Kconfig
+++ b/drivers/net/ethernet/qlogic/Kconfig
@@ -97,4 +97,9 @@ config QED
---help---
  This enables the support for ...
 
+config QEDE
+   tristate "QLogic QED 25/40/100Gb Ethernet NIC"
+   depends on QED
+   ---help---
+ This enables the support for ...
 endif # NET_VENDOR_QLOGIC
diff --git a/drivers/net/ethernet/qlogic/Makefile 
b/drivers/net/ethernet/qlogic/Makefile
index 7600138..cee90e0 100644
--- a/drivers/net/ethernet/qlogic/Makefile
+++ b/drivers/net/ethernet/qlogic/Makefile
@@ -7,3 +7,4 @@ obj-$(CONFIG_QLCNIC) += qlcnic/
 obj-$(CONFIG_QLGE) += qlge/
 obj-$(CONFIG_NETXEN_NIC) += netxen/
 obj-$(CONFIG_QED) += qed/
+obj-$(CONFIG_QEDE)+= qede/
diff --git a/drivers/net/ethernet/qlogic/qede/Makefile 
b/drivers/net/ethernet/qlogic/qede/Makefile
new file mode 100644
index 000..bedfe9f
--- /dev/null
+++ b/drivers/net/ethernet/qlogic/qede/Makefile
@@ -0,0 +1,3 @@
+obj-$(CONFIG_QEDE) := qede.o
+
+qede-y := qede_main.o
diff --git a/drivers/net/ethernet/qlogic/qede/qede.h 
b/drivers/net/ethernet/qlogic/qede/qede.h
new file mode 100644
index 000..7e2bcfa
--- /dev/null
+++ b/drivers/net/ethernet/qlogic/qede/qede.h
@@ -0,0 +1,73 @@
+/* QLogic qede NIC Driver
+* Copyright (c) 2015 QLogic Corporation
+*
+* This software is available under the terms of the GNU General Public License
+* (GPL) Version 2, available from the file COPYING in the main directory of
+* this source tree.
+*/
+
+#ifndef _QEDE_H_
+#define _QEDE_H_
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define QEDE_MAJOR_VERSION 8
+#define QEDE_MINOR_VERSION 4
+#define QEDE_REVISION_VERSION  0
+#define QEDE_ENGINEERING_VERSION   0
+#define DRV_MODULE_VERSION __stringify(QEDE_MAJOR_VERSION) "." \
+   __stringify(QEDE_MINOR_VERSION) "." \
+   __stringify(QEDE_REVISION_VERSION) "."  \
+   __stringify(QEDE_ENGINEERING_VERSION)
+
+#define QEDE_ETH_INTERFACE_VERSION 300
+
+#define DRV_MODULE_SYM qede
+
+struct qede_dev {
+   struct qed_dev  *cdev;
+   struct net_device   *ndev;
+   struct pci_dev  *pdev;
+
+   u32 dp_module;
+   u8  dp_level;
+
+   const struct qed_eth_ops*ops;
+
+   struct qed_dev_eth_info dev_info;
+#define QEDE_MAX_RSS_CNT(edev) ((edev)->dev_info.num_queues)
+#define QEDE_MAX_TSS_CNT(edev) ((edev)->dev_info.num_queues * \
+(edev)->dev_info.num_tc)
+
+   u16 num_rss;
+   u8  num_tc;
+#define QEDE_RSS_CNT(edev) ((edev)->num_rss)
+#define QEDE_TSS_CNT(edev) ((edev)->num_rss *  \
+(edev)->num_tc)
+#define QEDE_TSS_IDX(edev, txqidx) ((txqidx) % (edev)->num_rss)
+#define QEDE_TC_IDX(edev, txqidx)  ((txqidx) / (edev)->num_rss)
+
+   struct qed_int_info int_info;
+   unsigned char   primary_mac[ETH_ALEN];
+
+   /* Smaller private varaiant of the RTNL lock */
+   struct mutexqede_lock;
+   u32 state; /* Protected by qede_lock */
+};
+
+/* Debug print definitions */
+#define DP_NAME(edev) ((edev)->ndev->name)
+
+#endif /* _QEDE_H_ */
diff --git a/drivers/net/ethernet/qlogic/qede/qede_main.c 
b/drivers/net/ethernet/qlogic/qede/qede_main.c
new file mode 100644
index 000..35065dc
--- /dev/null
+++ b/drivers/net/ethernet/qlogic/qede/qede_main.c
@@ -0,0 +1,354 @@
+/* QLogic qede NIC Driver
+* Copyright (c) 2015 QLogic Corporation
+*
+* This software is available under the terms of the GNU General Public License
+* (GPL) Version 2, available from the file COPYING in the main directory of
+* this so

[RFC v3 net-next 09/10] qed: Add statistics support

2015-09-30 Thread Yuval Mintz

From: Manish Chopra 

Device statistics can be gathered on-demand. This adds the qed support for
reading the statistics [both function and port] from the device, and adds
to the public API a method for requesting the current statistics.

Signed-off-by: Manish Chopra 
Signed-off-by: Yuval Mintz 
Signed-off-by: Ariel Elior 
---
 drivers/net/ethernet/qlogic/qed/qed.h |  14 ++
 drivers/net/ethernet/qlogic/qed/qed_dev.c | 244 +-
 drivers/net/ethernet/qlogic/qed/qed_dev_api.h |   3 +
 drivers/net/ethernet/qlogic/qed/qed_hsi.h |  30 
 drivers/net/ethernet/qlogic/qed/qed_l2.c  |   3 +
 include/linux/qed/qed_eth_if.h|   3 +
 6 files changed, 296 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/qlogic/qed/qed.h 
b/drivers/net/ethernet/qlogic/qed/qed.h
index 4e21d79..f195cbd 100644
--- a/drivers/net/ethernet/qlogic/qed/qed.h
+++ b/drivers/net/ethernet/qlogic/qed/qed.h
@@ -212,7 +212,20 @@ struct qed_qm_info {
u32 pf_rl;
 };
 
+struct storm_stats {
+   u32 address;
+   u32 len;
+};
+
+struct qed_storm_stats {
+   struct storm_stats mstats;
+   struct storm_stats pstats;
+   struct storm_stats tstats;
+   struct storm_stats ustats;
+};
+
 struct qed_fw_data {
+   struct fw_ver_info  *fw_ver_info;
const u8*modes_tree_buf;
union init_op   *init_ops;
const u32   *arr_data;
@@ -296,6 +309,7 @@ struct qed_hwfn {
 
/* QM init */
struct qed_qm_info  qm_info;
+   struct qed_storm_stats  storm_stats;
 
/* Buffer for unzipping firmware data */
void*unzip_buf;
diff --git a/drivers/net/ethernet/qlogic/qed/qed_dev.c 
b/drivers/net/ethernet/qlogic/qed/qed_dev.c
index cde72e2..3993584 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_dev.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_dev.c
@@ -647,8 +647,10 @@ int qed_hw_init(struct qed_dev *cdev,
bool allow_npar_tx_switch,
const u8 *bin_fw_data)
 {
-   u32 load_code, param;
+   struct qed_storm_stats *p_stat;
+   u32 load_code, param, *p_address;
int rc, mfw_rc, i;
+   u8 fw_vport = 0;
 
rc = qed_init_fw_data(cdev, bin_fw_data);
if (rc != 0)
@@ -657,6 +659,10 @@ int qed_hw_init(struct qed_dev *cdev,
for_each_hwfn(cdev, i) {
struct qed_hwfn *p_hwfn = &cdev->hwfns[i];
 
+   rc = qed_fw_vport(p_hwfn, 0, &fw_vport);
+   if (rc != 0)
+   return rc;
+
/* Enable DMAE in PXP */
rc = qed_change_pci_hwfn(p_hwfn, p_hwfn->p_main_ptt, true);
 
@@ -723,6 +729,25 @@ int qed_hw_init(struct qed_dev *cdev,
}
 
p_hwfn->hw_init_done = true;
+
+   /* init PF stats */
+   p_stat = &p_hwfn->storm_stats;
+   p_stat->mstats.address = BAR0_MAP_REG_MSDM_RAM +
+MSTORM_QUEUE_STAT_OFFSET(fw_vport);
+   p_stat->mstats.len = sizeof(struct eth_mstorm_per_queue_stat);
+
+   p_stat->ustats.address = BAR0_MAP_REG_USDM_RAM +
+USTORM_QUEUE_STAT_OFFSET(fw_vport);
+   p_stat->ustats.len = sizeof(struct eth_ustorm_per_queue_stat);
+
+   p_stat->pstats.address = BAR0_MAP_REG_PSDM_RAM +
+PSTORM_QUEUE_STAT_OFFSET(fw_vport);
+   p_stat->pstats.len = sizeof(struct eth_pstorm_per_queue_stat);
+
+   p_address = &p_stat->tstats.address;
+   *p_address = BAR0_MAP_REG_TSDM_RAM +
+TSTORM_PORT_STAT_OFFSET(MFW_PORT(p_hwfn));
+   p_stat->tstats.len = sizeof(struct tstorm_per_port_stat);
}
 
return 0;
@@ -1503,6 +1528,223 @@ void qed_chain_free(struct qed_dev *cdev,
  p_chain->p_phys_addr);
 }
 
+static void __qed_get_vport_stats(struct qed_dev   *cdev,
+ struct qed_eth_stats  *stats)
+{
+   int i, j;
+
+   memset(stats, 0, sizeof(*stats));
+
+   for_each_hwfn(cdev, i) {
+   struct qed_hwfn *p_hwfn = &cdev->hwfns[i];
+   struct eth_mstorm_per_queue_stat mstats;
+   struct eth_ustorm_per_queue_stat ustats;
+   struct eth_pstorm_per_queue_stat pstats;
+   struct tstorm_per_port_stat tstats;
+   struct port_stats port_stats;
+   struct qed_ptt *p_ptt = qed_ptt_acquire(p_hwfn);
+
+   if (!p_ptt) {
+   DP_ERR(p_hwfn, "Failed to acquire ptt\n");
+   continue;
+   }
+
+   memset(&mstats, 0, sizeof(mstats));
+   qed_memcpy_from(p_hwfn, p_ptt, &mstats,
+   p_hwfn->storm_stats.mstats.address,
+

[RFC v3 net-next 10/10] qede: Add basic ethtool support

2015-09-30 Thread Yuval Mintz

From: Sudarsana Kalluru 

This adds basic ethtool operations to the qed driver, allowing support in:
 - Statistics gathering [ethtool -S]
 - Setting of debug level [ethtool -s  msglvl]
 - Getting basic information [ethtool, ethtool -i]

In addition it adds the ability to change the MTU.

Signed-off-by: Sudarsana Kalluru 
Signed-off-by: Yuval Mintz 
Signed-off-by: Ariel Elior 
---
 drivers/net/ethernet/qlogic/qede/Makefile   |   2 +-
 drivers/net/ethernet/qlogic/qede/qede.h |  74 +
 drivers/net/ethernet/qlogic/qede/qede_ethtool.c | 385 
 drivers/net/ethernet/qlogic/qede/qede_main.c| 137 -
 4 files changed, 596 insertions(+), 2 deletions(-)
 create mode 100644 drivers/net/ethernet/qlogic/qede/qede_ethtool.c

diff --git a/drivers/net/ethernet/qlogic/qede/Makefile 
b/drivers/net/ethernet/qlogic/qede/Makefile
index bedfe9f..06ff90d 100644
--- a/drivers/net/ethernet/qlogic/qede/Makefile
+++ b/drivers/net/ethernet/qlogic/qede/Makefile
@@ -1,3 +1,3 @@
 obj-$(CONFIG_QEDE) := qede.o
 
-qede-y := qede_main.o
+qede-y := qede_main.o qede_ethtool.o
diff --git a/drivers/net/ethernet/qlogic/qede/qede.h 
b/drivers/net/ethernet/qlogic/qede/qede.h
index 7947942..ea00d5f 100644
--- a/drivers/net/ethernet/qlogic/qede/qede.h
+++ b/drivers/net/ethernet/qlogic/qede/qede.h
@@ -36,6 +36,70 @@
 
 #define DRV_MODULE_SYM qede
 
+struct qede_stats {
+   u64 no_buff_discards;
+   u64 rx_ucast_bytes;
+   u64 rx_mcast_bytes;
+   u64 rx_bcast_bytes;
+   u64 rx_ucast_pkts;
+   u64 rx_mcast_pkts;
+   u64 rx_bcast_pkts;
+   u64 mftag_filter_discards;
+   u64 mac_filter_discards;
+   u64 tx_ucast_bytes;
+   u64 tx_mcast_bytes;
+   u64 tx_bcast_bytes;
+   u64 tx_ucast_pkts;
+   u64 tx_mcast_pkts;
+   u64 tx_bcast_pkts;
+   u64 tx_err_drop_pkts;
+   u64 coalesced_pkts;
+   u64 coalesced_events;
+   u64 coalesced_aborts_num;
+   u64 non_coalesced_pkts;
+   u64 coalesced_bytes;
+
+   /* port */
+   u64 rx_64_byte_packets;
+   u64 rx_127_byte_packets;
+   u64 rx_255_byte_packets;
+   u64 rx_511_byte_packets;
+   u64 rx_1023_byte_packets;
+   u64 rx_1518_byte_packets;
+   u64 rx_1522_byte_packets;
+   u64 rx_2047_byte_packets;
+   u64 rx_4095_byte_packets;
+   u64 rx_9216_byte_packets;
+   u64 rx_16383_byte_packets;
+   u64 rx_crc_errors;
+   u64 rx_mac_crtl_frames;
+   u64 rx_pause_frames;
+   u64 rx_pfc_frames;
+   u64 rx_align_errors;
+   u64 rx_carrier_errors;
+   u64 rx_oversize_packets;
+   u64 rx_jabbers;
+   u64 rx_undersize_packets;
+   u64 rx_fragments;
+   u64 tx_64_byte_packets;
+   u64 tx_65_to_127_byte_packets;
+   u64 tx_128_to_255_byte_packets;
+   u64 tx_256_to_511_byte_packets;
+   u64 tx_512_to_1023_byte_packets;
+   u64 tx_1024_to_1518_byte_packets;
+   u64 tx_1519_to_2047_byte_packets;
+   u64 tx_2048_to_4095_byte_packets;
+   u64 tx_4096_to_9216_byte_packets;
+   u64 tx_9217_to_16383_byte_packets;
+   u64 tx_pause_frames;
+   u64 tx_pfc_frames;
+   u64 tx_lpi_entry_count;
+   u64 tx_total_collisions;
+   u64 brb_truncates;
+   u64 brb_discards;
+   u64 tx_mac_ctrl_frames;
+};
+
 struct qede_dev {
struct qed_dev  *cdev;
struct net_device   *ndev;
@@ -84,6 +148,7 @@ struct qede_dev {
max_t(u64, 1UL << QEDE_RX_ALIGN_SHIFT,  \
  SKB_DATA_ALIGN(sizeof(struct skb_shared_info)))
 
+   struct qede_stats   stats;
struct qed_update_vport_rss_params  rss_params;
u16 q_num_rx_buffers; /* Must be a power of two */
u16 q_num_tx_buffers; /* Must be a power of two */
@@ -194,6 +259,15 @@ union qede_reload_args {
u16 mtu;
 };
 
+void qede_config_debug(uint debug, u32 *p_dp_module, u8 *p_dp_level);
+void qede_set_ethtool_ops(struct net_device *netdev);
+void qede_reload(struct qede_dev *edev,
+void (*func)(struct qede_dev *edev,
+ union qede_reload_args *args),
+union qede_reload_args *args);
+int qede_change_mtu(struct net_device *dev, int new_mtu);
+void qede_fill_by_demand_stats(struct qede_dev *edev);
+
 #define RX_RING_SIZE_POW   13
 #define RX_RING_SIZE   BIT(RX_RING_SIZE_POW)
 #define NUM_RX_BDS_MAX (RX_RING_SIZE - 1)
diff --git a/drivers/net/ethernet/qlogic/qede/qede_ethtool.c 
b/drivers/net/ethernet/qlogic/qede/qede_ethtool.c
new file mode 100644
index 000..3a36247
--- /dev/null
+++ b/drivers/net/ethernet/qlogic/qede/qede_ethtool.c
@@ -0,0 +1,385 @@
+/* QLogic qede NIC Driver
+* Copyright (c) 2015 QLogic Corporation
+*
+* This software is available under the terms of the GNU General Public License
+* (GPL) Version 2, available from the file COPYING in the main directory of
+* this source tre

[RFC v3 net-next 05/10] qede: Add basic network device support

2015-09-30 Thread Yuval Mintz

From: Sudarsana Kalluru 

This patch includes the basic Rx/Tx support for the driver [although
carrier will still never be turned on].
Following this patch the driver registers a network device, initializes
it and prepares it for traffic.

Signed-off-by: Sudarsana Kalluru 
Signed-off-by: Yuval Mintz 
Signed-off-by: Ariel Elior 
---
 drivers/net/ethernet/qlogic/qede/qede.h  |  128 ++
 drivers/net/ethernet/qlogic/qede/qede_main.c | 1801 ++
 2 files changed, 1929 insertions(+)

diff --git a/drivers/net/ethernet/qlogic/qede/qede.h 
b/drivers/net/ethernet/qlogic/qede/qede.h
index 7e2bcfa..424ef4a 100644
--- a/drivers/net/ethernet/qlogic/qede/qede.h
+++ b/drivers/net/ethernet/qlogic/qede/qede.h
@@ -51,6 +51,7 @@ struct qede_dev {
 #define QEDE_MAX_TSS_CNT(edev) ((edev)->dev_info.num_queues * \
 (edev)->dev_info.num_tc)
 
+   struct qede_fastpath*fp_array;
u16 num_rss;
u8  num_tc;
 #define QEDE_RSS_CNT(edev) ((edev)->num_rss)
@@ -58,6 +59,9 @@ struct qede_dev {
 (edev)->num_tc)
 #define QEDE_TSS_IDX(edev, txqidx) ((txqidx) % (edev)->num_rss)
 #define QEDE_TC_IDX(edev, txqidx)  ((txqidx) / (edev)->num_rss)
+#define QEDE_TX_QUEUE(edev, txqidx)\
+   (&(edev)->fp_array[QEDE_TSS_IDX((edev), (txqidx))].txqs[QEDE_TC_IDX( \
+   (edev), (txqidx))])
 
struct qed_int_info int_info;
unsigned char   primary_mac[ETH_ALEN];
@@ -65,9 +69,133 @@ struct qede_dev {
/* Smaller private varaiant of the RTNL lock */
struct mutexqede_lock;
u32 state; /* Protected by qede_lock */
+   u16 rx_buf_size;
+   /* L2 header size + 2*VLANs (8 bytes) + LLC SNAP (8 bytes) */
+#define ETH_OVERHEAD   (ETH_HLEN + 8 + 8)
+   /* Max supported alignment is 256 (8 shift)
+* minimal alignment shift 6 is optimal for 57xxx HW performance
+*/
+#define QEDE_RX_ALIGN_SHIFTmax(6, min(8, L1_CACHE_SHIFT))
+   /* We assume skb_build() uses sizeof(struct skb_shared_info) bytes
+* at the end of skb->data, to avoid wasting a full cache line.
+* This reduces memory use (skb->truesize).
+*/
+#define QEDE_FW_RX_ALIGN_END   \
+   max_t(u64, 1UL << QEDE_RX_ALIGN_SHIFT,  \
+ SKB_DATA_ALIGN(sizeof(struct skb_shared_info)))
+
+   struct qed_update_vport_rss_params  rss_params;
+   u16 q_num_rx_buffers; /* Must be a power of two */
+   u16 q_num_tx_buffers; /* Must be a power of two */
+};
+
+enum QEDE_STATE {
+   QEDE_STATE_CLOSED,
+   QEDE_STATE_OPEN,
+};
+
+#define HILO_U64(hi, lo)   u64)(hi)) << 32) + (lo))
+
+#defineMAX_NUM_TC  8
+#defineMAX_NUM_PRI 8
+
+/* The driver supports the new build_skb() API:
+ * RX ring buffer contains pointer to kmalloc() data only,
+ * skb are built only after the frame was DMA-ed.
+ */
+struct sw_rx_data {
+   u8 *data;
+
+   DEFINE_DMA_UNMAP_ADDR(mapping);
+};
+
+struct qede_rx_queue {
+   __le16  *hw_cons_ptr;
+   struct sw_rx_data   *sw_rx_ring;
+   u16 sw_rx_cons;
+   u16 sw_rx_prod;
+   struct qed_chainrx_bd_ring;
+   struct qed_chainrx_comp_ring;
+   void __iomem*hw_rxq_prod_addr;
+
+   int rx_buf_size;
+
+   u16 num_rx_buffers;
+   u16 rxq_id;
+
+   u64 rx_hw_errors;
+   u64 rx_alloc_errors;
+};
+
+union db_prod {
+   struct eth_db_data data;
+   u32 raw;
+};
+
+struct sw_tx_bd {
+   struct sk_buff *skb;
+   u8 flags;
+/* Set on the first BD descriptor when there is a split BD */
+#define QEDE_TSO_SPLIT_BD  BIT(0)
+};
+
+struct qede_tx_queue {
+   int index; /* Queue index */
+   __le16  *hw_cons_ptr;
+   struct sw_tx_bd *sw_tx_ring;
+   u16 sw_tx_cons;
+   u16 sw_tx_prod;
+   struct qed_chaintx_pbl;
+   void __iomem*doorbell_addr;
+   union db_prod   tx_db;
+
+   u16 num_tx_buffers;
+};
+
+#define BD_UNMAP_ADDR(bd)  HILO_U64(le32_to_cpu((bd)->addr.hi), \
+le32_to_cpu((bd)->addr.lo))
+#define BD_SET_UNMAP_ADDR_LEN(bd, maddr, len)  \
+   do {\
+   (bd)->addr.hi = cpu_to_le32(upper_32_

[RFC v3 net-next 02/10] qed: Add basic L2 interface

2015-09-30 Thread Yuval Mintz

From: Manish Chopra 

This patch adds a public API for a network driver to work on top of QED.
The interface itself is very minimal - it's mostly infrastructure, as the
only content it has after this patch is a query for HW-based information
required for the creation of a network interface [I.e., no actual
protocol-specific configurations are supported].

Signed-off-by: Manish Chopra 
Signed-off-by: Yuval Mintz 
Signed-off-by: Ariel Elior 
---
 drivers/net/ethernet/qlogic/qed/Makefile  |   2 +-
 drivers/net/ethernet/qlogic/qed/qed.h |  14 ++
 drivers/net/ethernet/qlogic/qed/qed_dev.c |  62 +++
 drivers/net/ethernet/qlogic/qed/qed_hsi.h |   1 +
 drivers/net/ethernet/qlogic/qed/qed_l2.c  |  87 ++
 include/linux/qed/eth_common.h| 278 ++
 include/linux/qed/qed_eth_if.h|  38 
 7 files changed, 481 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/ethernet/qlogic/qed/qed_l2.c
 create mode 100644 include/linux/qed/eth_common.h
 create mode 100644 include/linux/qed/qed_eth_if.h

diff --git a/drivers/net/ethernet/qlogic/qed/Makefile 
b/drivers/net/ethernet/qlogic/qed/Makefile
index 5bbe0c7..dbe6938 100644
--- a/drivers/net/ethernet/qlogic/qed/Makefile
+++ b/drivers/net/ethernet/qlogic/qed/Makefile
@@ -1,3 +1,3 @@
 obj-$(CONFIG_QED) := qed.o
 
-qed-y := qed_cxt.o qed_dev.o qed_hw.o qed_init_fw_funcs.o qed_init_ops.o 
qed_int.o qed_main.o qed_mcp.o qed_sp_commands.o qed_spq.o
+qed-y := qed_cxt.o qed_dev.o qed_hw.o qed_init_fw_funcs.o qed_init_ops.o 
qed_int.o qed_l2.o qed_main.o qed_mcp.o qed_sp_commands.o qed_spq.o
diff --git a/drivers/net/ethernet/qlogic/qed/qed.h 
b/drivers/net/ethernet/qlogic/qed/qed.h
index f9f01bb..7373928 100644
--- a/drivers/net/ethernet/qlogic/qed/qed.h
+++ b/drivers/net/ethernet/qlogic/qed/qed.h
@@ -24,6 +24,7 @@
 #include 
 #include "qed_hsi.h"
 
+extern const struct qed_common_ops qed_common_ops_pass;
 #define DRV_MODULE_VERSION "8.4.0.0"
 
 #define MAX_HWFNS_PER_DEVICE(4)
@@ -91,13 +92,22 @@ struct qed_qm_iids {
 
 enum QED_RESOURCES {
QED_SB,
+   QED_L2_QUEUE,
QED_VPORT,
+   QED_RSS_ENG,
QED_PQ,
QED_RL,
+   QED_MAC,
+   QED_VLAN,
QED_ILT,
QED_MAX_RESC,
 };
 
+enum QED_FEATURE {
+   QED_PF_L2_QUE,
+   QED_MAX_FEATURES,
+};
+
 struct qed_hw_info {
/* PCI personality */
enum qed_pci_personalitypersonality;
@@ -105,6 +115,7 @@ struct qed_hw_info {
/* Resource Allocation scheme results */
u32 resc_start[QED_MAX_RESC];
u32 resc_num[QED_MAX_RESC];
+   u32 feat_num[QED_MAX_FEATURES];
 
 #define RESC_START(_p_hwfn, resc) ((_p_hwfn)->hw_info.resc_start[resc])
 #define RESC_NUM(_p_hwfn, resc) ((_p_hwfn)->hw_info.resc_num[resc])
@@ -266,6 +277,9 @@ struct qed_hwfn {
 
struct qed_mcp_info *mcp_info;
 
+   struct qed_hw_cid_data  *p_tx_cids;
+   struct qed_hw_cid_data  *p_rx_cids;
+
struct qed_dmae_infodmae_info;
 
/* QM init */
diff --git a/drivers/net/ethernet/qlogic/qed/qed_dev.c 
b/drivers/net/ethernet/qlogic/qed/qed_dev.c
index 7769720..1053388d 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_dev.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_dev.c
@@ -94,6 +94,15 @@ void qed_resc_free(struct qed_dev *cdev)
for_each_hwfn(cdev, i) {
struct qed_hwfn *p_hwfn = &cdev->hwfns[i];
 
+   kfree(p_hwfn->p_tx_cids);
+   p_hwfn->p_tx_cids = NULL;
+   kfree(p_hwfn->p_rx_cids);
+   p_hwfn->p_rx_cids = NULL;
+   }
+
+   for_each_hwfn(cdev, i) {
+   struct qed_hwfn *p_hwfn = &cdev->hwfns[i];
+
qed_cxt_mngr_free(p_hwfn);
qed_qm_info_free(p_hwfn);
qed_spq_free(p_hwfn);
@@ -204,6 +213,29 @@ int qed_resc_alloc(struct qed_dev *cdev)
if (!cdev->fw_data)
return -ENOMEM;
 
+   /* Allocate Memory for the Queue->CID mapping */
+   for_each_hwfn(cdev, i) {
+   struct qed_hwfn *p_hwfn = &cdev->hwfns[i];
+   int tx_size = sizeof(struct qed_hw_cid_data) *
+ RESC_NUM(p_hwfn, QED_L2_QUEUE);
+   int rx_size = sizeof(struct qed_hw_cid_data) *
+ RESC_NUM(p_hwfn, QED_L2_QUEUE);
+
+   p_hwfn->p_tx_cids = kzalloc(tx_size, GFP_KERNEL);
+   if (!p_hwfn->p_tx_cids) {
+   DP_NOTICE(p_hwfn,
+ "Failed to allocate memory for Tx Cids\n");
+   goto alloc_err;
+   }
+
+   p_hwfn->p_rx_cids = kzalloc(rx_size, GFP_KERNEL);
+   if (!p_hwfn->p_rx_cids) {
+   DP_NOTICE(p_hwfn,
+ "Failed to allocate memory for Rx Cids\n");
+

[RFC v3 net-next 08/10] qede: Add support for link

2015-09-30 Thread Yuval Mintz

From: Sudarsana Kalluru 

This adds basic link functionality to qede - driver still doesn't provide
users with an API to change any link property, but it does request qed to
initialize the link using default configuration, and registers a callback
that allows it to get link notifications.

This patch adds the ability of the driver to set the carrier as active and
to enable traffic as a result of async. link notifications.
Following this patch, driver should be capable of running traffic.

Signed-off-by: Sudarsana Kalluru 
Signed-off-by: Yuval Mintz 
Signed-off-by: Ariel Elior 
---
 drivers/net/ethernet/qlogic/qede/qede_main.c | 47 
 1 file changed, 47 insertions(+)

diff --git a/drivers/net/ethernet/qlogic/qede/qede_main.c 
b/drivers/net/ethernet/qlogic/qede/qede_main.c
index 7e88365..11815b5 100644
--- a/drivers/net/ethernet/qlogic/qede/qede_main.c
+++ b/drivers/net/ethernet/qlogic/qede/qede_main.c
@@ -87,6 +87,7 @@ static int qede_probe(struct pci_dev *pdev, const struct 
pci_device_id *id);
 static void qede_remove(struct pci_dev *pdev);
 static int qede_alloc_rx_buffer(struct qede_dev *edev,
struct qede_rx_queue *rxq);
+static void qede_link_update(void *dev, struct qed_link_output *link);
 
 static struct pci_driver qede_pci_driver = {
.name = "qede",
@@ -95,6 +96,12 @@ static struct pci_driver qede_pci_driver = {
.remove = qede_remove,
 };
 
+static struct qed_eth_cb_ops qede_ll_ops = {
+   {
+   .link_update = qede_link_update,
+   },
+};
+
 static int qede_netdev_event(struct notifier_block *this, unsigned long event,
 void *ptr)
 {
@@ -1304,6 +1311,8 @@ static int __qede_probe(struct pci_dev *pdev, u32 
dp_module, u8 dp_level,
 
edev->ops->common->set_id(cdev, edev->ndev->name, DRV_MODULE_VERSION);
 
+   edev->ops->register_ops(cdev, &qede_ll_ops, edev);
+
INIT_DELAYED_WORK(&edev->sp_task, qede_sp_task);
mutex_init(&edev->qede_lock);
 
@@ -2088,6 +2097,7 @@ enum qede_unload_mode {
 
 static void qede_unload(struct qede_dev *edev, enum qede_unload_mode mode)
 {
+   struct qed_link_params link_params;
int rc;
 
DP_INFO(edev, "Starting qede unload\n");
@@ -2099,6 +2109,10 @@ static void qede_unload(struct qede_dev *edev, enum 
qede_unload_mode mode)
netif_tx_disable(edev->ndev);
netif_carrier_off(edev->ndev);
 
+   /* Reset the link */
+   memset(&link_params, 0, sizeof(link_params));
+   link_params.link_up = false;
+   edev->ops->common->set_link(edev->cdev, &link_params);
rc = qede_stop_queues(edev);
if (rc) {
qede_sync_free_irqs(edev);
@@ -2129,6 +2143,8 @@ enum qede_load_mode {
 
 static int qede_load(struct qede_dev *edev, enum qede_load_mode mode)
 {
+   struct qed_link_params link_params;
+   struct qed_link_output link_output;
int rc;
 
DP_INFO(edev, "Starting qede load\n");
@@ -2172,6 +2188,17 @@ static int qede_load(struct qede_dev *edev, enum 
qede_load_mode mode)
mutex_lock(&edev->qede_lock);
edev->state = QEDE_STATE_OPEN;
mutex_unlock(&edev->qede_lock);
+
+   /* Ask for link-up using current configuration */
+   memset(&link_params, 0, sizeof(link_params));
+   link_params.link_up = true;
+   edev->ops->common->set_link(edev->cdev, &link_params);
+
+   /* Query whether link is already-up */
+   memset(&link_output, 0, sizeof(link_output));
+   edev->ops->common->get_link(edev->cdev, &link_output);
+   qede_link_update(edev, &link_output);
+
DP_INFO(edev, "Ending successfully qede load\n");
 
return 0;
@@ -2217,6 +2244,26 @@ static int qede_close(struct net_device *ndev)
return 0;
 }
 
+static void qede_link_update(void *dev, struct qed_link_output *link)
+{
+   struct qede_dev *edev = dev;
+
+   if (!netif_running(edev->ndev)) {
+   DP_VERBOSE(edev, NETIF_MSG_LINK, "Interface is not running\n");
+   return;
+   }
+
+   if (link->link_up) {
+   DP_NOTICE(edev, "Link is up\n");
+   netif_tx_start_all_queues(edev->ndev);
+   netif_carrier_on(edev->ndev);
+   } else {
+   DP_NOTICE(edev, "Link is down\n");
+   netif_tx_disable(edev->ndev);
+   netif_carrier_off(edev->ndev);
+   }
+}
+
 static int qede_set_mac_addr(struct net_device *ndev, void *p)
 {
struct qede_dev *edev = netdev_priv(ndev);
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC v3 net-next 04/10] qed: Add slowpath L2 support

2015-09-30 Thread Yuval Mintz

From: Manish Chopra 

This patch adds to the qed the support to configure various L2 elements,
such as channels and basic filtering conditions.
It also enhances its public API to allow qede to later utilize this
functionality.

Signed-off-by: Manish Chopra 
Signed-off-by: Yuval Mintz 
Signed-off-by: Ariel Elior 
---
 drivers/net/ethernet/qlogic/qed/qed_dev.c |  126 ++
 drivers/net/ethernet/qlogic/qed/qed_dev_api.h |   58 +
 drivers/net/ethernet/qlogic/qed/qed_hsi.h |  294 +
 drivers/net/ethernet/qlogic/qed/qed_l2.c  | 1672 +
 drivers/net/ethernet/qlogic/qed/qed_main.c|   10 +
 drivers/net/ethernet/qlogic/qed/qed_mcp.c |   16 +
 drivers/net/ethernet/qlogic/qed/qed_mcp.h |   13 +
 drivers/net/ethernet/qlogic/qed/qed_sp.h  |   27 +
 drivers/net/ethernet/qlogic/qed/qed_spq.c |   29 +
 include/linux/qed/qed_eth_if.h|  115 ++
 10 files changed, 2360 insertions(+)

diff --git a/drivers/net/ethernet/qlogic/qed/qed_dev.c 
b/drivers/net/ethernet/qlogic/qed/qed_dev.c
index 1053388d..30408b7 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_dev.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_dev.c
@@ -800,6 +800,60 @@ int qed_hw_stop(struct qed_dev *cdev)
return rc;
 }
 
+void qed_hw_stop_fastpath(struct qed_dev *cdev)
+{
+   int i, j;
+
+   for_each_hwfn(cdev, j) {
+   struct qed_hwfn *p_hwfn = &cdev->hwfns[j];
+   struct qed_ptt *p_ptt   = p_hwfn->p_main_ptt;
+
+   DP_VERBOSE(p_hwfn,
+  NETIF_MSG_IFDOWN,
+  "Shutting down the fastpath\n");
+
+   qed_wr(p_hwfn, p_ptt,
+  NIG_REG_RX_LLH_BRB_GATE_DNTFWD_PERPF, 0x1);
+
+   qed_wr(p_hwfn, p_ptt, PRS_REG_SEARCH_TCP, 0x0);
+   qed_wr(p_hwfn, p_ptt, PRS_REG_SEARCH_UDP, 0x0);
+   qed_wr(p_hwfn, p_ptt, PRS_REG_SEARCH_FCOE, 0x0);
+   qed_wr(p_hwfn, p_ptt, PRS_REG_SEARCH_ROCE, 0x0);
+   qed_wr(p_hwfn, p_ptt, PRS_REG_SEARCH_OPENFLOW, 0x0);
+
+   qed_wr(p_hwfn, p_ptt, TM_REG_PF_ENABLE_CONN, 0x0);
+   qed_wr(p_hwfn, p_ptt, TM_REG_PF_ENABLE_TASK, 0x0);
+   for (i = 0; i < QED_HW_STOP_RETRY_LIMIT; i++) {
+   if ((!qed_rd(p_hwfn, p_ptt,
+TM_REG_PF_SCAN_ACTIVE_CONN)) &&
+   (!qed_rd(p_hwfn, p_ptt,
+TM_REG_PF_SCAN_ACTIVE_TASK)))
+   break;
+
+   usleep_range(1000, 2000);
+   }
+   if (i == QED_HW_STOP_RETRY_LIMIT)
+   DP_NOTICE(p_hwfn,
+ "Timers linear scans are not over [Connection 
%02x Tasks %02x]\n",
+ (u8)qed_rd(p_hwfn, p_ptt,
+TM_REG_PF_SCAN_ACTIVE_CONN),
+ (u8)qed_rd(p_hwfn, p_ptt,
+TM_REG_PF_SCAN_ACTIVE_TASK));
+
+   qed_int_igu_init_pure_rt(p_hwfn, p_ptt, false, false);
+
+   /* Need to wait 1ms to guarantee SBs are cleared */
+   usleep_range(1000, 2000);
+   }
+}
+
+void qed_hw_start_fastpath(struct qed_hwfn *p_hwfn)
+{
+   /* Re-open incoming traffic */
+   qed_wr(p_hwfn, p_hwfn->p_main_ptt,
+  NIG_REG_RX_LLH_BRB_GATE_DNTFWD_PERPF, 0x0);
+}
+
 static int qed_reg_assert(struct qed_hwfn *hwfn,
  struct qed_ptt *ptt, u32 reg,
  bool expected)
@@ -1346,3 +1400,75 @@ void qed_chain_free(struct qed_dev *cdev,
  p_chain->p_virt_addr,
  p_chain->p_phys_addr);
 }
+
+int qed_fw_l2_queue(struct qed_hwfn *p_hwfn,
+   u16 src_id,
+   u16 *dst_id)
+{
+   if (src_id >= RESC_NUM(p_hwfn, QED_L2_QUEUE)) {
+   u16 min, max;
+
+   min = (u16)RESC_START(p_hwfn, QED_L2_QUEUE);
+   max = min + RESC_NUM(p_hwfn, QED_L2_QUEUE);
+   DP_NOTICE(
+   p_hwfn,
+   "l2_queue id [%d] is not valid, available indices [%d - 
%d]\n",
+   src_id,
+   min,
+   max);
+
+   return -EINVAL;
+   }
+
+   *dst_id = RESC_START(p_hwfn, QED_L2_QUEUE) + src_id;
+
+   return 0;
+}
+
+int qed_fw_vport(struct qed_hwfn   *p_hwfn,
+u8  src_id,
+u8  *dst_id)
+{
+   if (src_id >= RESC_NUM(p_hwfn, QED_VPORT)) {
+   u8 min, max;
+
+   min = (u8)RESC_START(p_hwfn, QED_VPORT);
+   max = min + RESC_NUM(p_hwfn, QED_VPORT);
+   DP_NOTICE(
+   p_hwfn,
+   "vport id [%d] is not valid, available indices [%d - 
%d]\n",
+

[PATCH net-next] tcp: restore fastopen operations

2015-09-30 Thread Eric Dumazet

From: Eric Dumazet 

I accidentally cleared fastopenq.max_qlen in reqsk_queue_alloc()
while max_qlen can be set before listen() is called,
using TCP_FASTOPEN socket option for example.

Fixes: 0536fcc039a8 ("tcp: prepare fastopen code for upcoming listener changes")
Signed-off-by: Eric Dumazet 
---
 net/core/request_sock.c |1 -
 1 file changed, 1 deletion(-)

diff --git a/net/core/request_sock.c b/net/core/request_sock.c
index e22cfa4ed25f817dbc4871e1618349b2..ef031d959d80f2b45e4077680026a0f8 100644
--- a/net/core/request_sock.c
+++ b/net/core/request_sock.c
@@ -64,7 +64,6 @@ int reqsk_queue_alloc(struct request_sock_queue *queue,
queue->fastopenq.rskq_rst_head = NULL;
queue->fastopenq.rskq_rst_tail = NULL;
queue->fastopenq.qlen = 0;
-   queue->fastopenq.max_qlen = 0;
 
queue->rskq_accept_head = NULL;
lopt->nr_table_entries = nr_table_entries;


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: List corruption on epoll_ctl(EPOLL_CTL_DEL) an AF_UNIX socket

2015-09-30 Thread Rainer Weikusat

Mathias Krause  writes:
> On 30 September 2015 at 12:56, Rainer Weikusat 
>  wrote:
>> Mathias Krause  writes:
>>> On 29 September 2015 at 21:09, Jason Baron  wrote:
 However, if we call connect on socket 's', to connect to a new socket 
 'o2', we
 drop the reference on the original socket 'o'. Thus, we can now close 
 socket
 'o' without unregistering from epoll. Then, when we either close the ep
 or unregister 'o', we end up with this list corruption. Thus, this is not a
 race per se, but can be triggered sequentially.
>>>
>>> Sounds profound, but the reproducers calls connect only once per
>>> socket. So there is no "connect to a new socket", no?
>>> But w/e, see below.
>>
>> In case you want some information on this: This is a kernel warning I
>> could trigger (more than once) on the single day I could so far spend
>> looking into this (3.2.54 kernel):
>>
>> Sep 15 19:37:19 doppelsaurus kernel: WARNING: at lib/list_debug.c:53 
>> list_del+0x9/0x30()
>> Sep 15 19:37:19 doppelsaurus kernel: Hardware name: 500-330nam
>> Sep 15 19:37:19 doppelsaurus kernel: list_del corruption. prev->next should 
>> be 88022c38f078, but was dead00100100
>> [snip]
>
> Is that with Jason's patch or a vanilla v3.2.54?

That's a kernel warning which occurred repeatedly (among other "link 
pointer disorganization" warnings) when I tested the "program with
unknown behaviour" you wrote with the kernel I'm currently supporting a
while ago (as I already wrote in the original mail).
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] qed: fix simple_return.cocci warnings

2015-09-30 Thread kbuild test robot

drivers/net/ethernet/qlogic/qed/qed_spq.c:673:1-3: WARNING: end returns can be 
simpified

 Simplify a trivial if-return sequence.  Possibly combine with a
 preceding function call.

Generated by: scripts/coccinelle/misc/simple_return.cocci

CC: Yuval Mintz 
Signed-off-by: Fengguang Wu 
---

 qed_spq.c |6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

--- a/drivers/net/ethernet/qlogic/qed/qed_spq.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_spq.c
@@ -670,13 +670,9 @@ static int qed_spq_pend_post(struct qed_
qed_spq_add_entry(p_hwfn, p_ent, p_ent->priority);
}
 
-   rc = qed_spq_post_list(p_hwfn,
+   return qed_spq_post_list(p_hwfn,
   &p_spq->pending,
   SPQ_HIGH_PRI_RESERVE_DEFAULT);
-   if (rc)
-   return rc;
-
-   return 0;
 }
 
 int qed_spq_post(struct qed_hwfn   *p_hwfn,
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] qed: fix kzalloc-simple.cocci warnings

2015-09-30 Thread kbuild test robot

drivers/net/ethernet/qlogic/qed/qed_int.c:644:30-37: WARNING: kzalloc should be 
used for p_hwfn -> hw_info . p_igu_info, instead of kmalloc/memset


 Use kzalloc rather than kmalloc followed by memset with 0

 This considers some simple cases that are common and easy to validate
 Note in particular that there are no ...s in the rule, so all of the
 matched code has to be contiguous

Generated by: scripts/coccinelle/api/alloc/kzalloc-simple.cocci

CC: Yuval Mintz 
Signed-off-by: Fengguang Wu 
---

 qed_int.c |4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

--- a/drivers/net/ethernet/qlogic/qed/qed_int.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_int.c
@@ -641,13 +641,11 @@ int qed_int_igu_read_cam(struct qed_hwfn
u16 sb_id;
u16 prev_sb_id = 0xFF;
 
-   p_hwfn->hw_info.p_igu_info = kmalloc(sizeof(*p_igu_info), GFP_ATOMIC);
+   p_hwfn->hw_info.p_igu_info = kzalloc(sizeof(*p_igu_info), GFP_ATOMIC);
 
if (!p_hwfn->hw_info.p_igu_info)
return -ENOMEM;
 
-   memset(p_hwfn->hw_info.p_igu_info, 0, sizeof(*p_igu_info));
-
p_igu_info = p_hwfn->hw_info.p_igu_info;
 
/* Initialize base sb / sb cnt for PFs */
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: List corruption on epoll_ctl(EPOLL_CTL_DEL) an AF_UNIX socket

2015-09-30 Thread Mathias Krause

On 30 September 2015 at 15:25, Rainer Weikusat
 wrote:
> Mathias Krause  writes:
>> On 30 September 2015 at 12:56, Rainer Weikusat 
>>  wrote:
>>> In case you want some information on this: This is a kernel warning I
>>> could trigger (more than once) on the single day I could so far spend
>>> looking into this (3.2.54 kernel):
>>>
>>> Sep 15 19:37:19 doppelsaurus kernel: WARNING: at lib/list_debug.c:53 
>>> list_del+0x9/0x30()
>>> Sep 15 19:37:19 doppelsaurus kernel: Hardware name: 500-330nam
>>> Sep 15 19:37:19 doppelsaurus kernel: list_del corruption. prev->next should 
>>> be 88022c38f078, but was dead00100100
>>> [snip]
>>
>> Is that with Jason's patch or a vanilla v3.2.54?
>
> That's a kernel warning which occurred repeatedly (among other "link
> pointer disorganization" warnings) when I tested the "program with
> unknown behaviour" you wrote with the kernel I'm currently supporting a
> while ago (as I already wrote in the original mail).

So I assume Jason's patch is not included in your kernel. Then those
messages are expected; expected even on kernels as old as v2.6.27.
Can you re-try with Jason's patch applied?

Thanks,
Mathias
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC v3 net-next 04/10] qed: Add slowpath L2 support

2015-09-30 Thread kbuild test robot

Hi Manish,

[auto build test results on v4.3-rc3 -- if it's inappropriate base, please 
ignore]

config: i386-allmodconfig (attached as .config)
reproduce:
  git checkout 94302d8beecd786eb3a56ca1be467d3493c8943d
  # save the attached .config to linux build tree
  make ARCH=i386 

All warnings (new ones prefixed by >>):

   In file included from include/linux/byteorder/little_endian.h:4:0,
from arch/x86/include/uapi/asm/byteorder.h:4,
from drivers/net/ethernet/qlogic/qed/qed_l2.c:10:
   include/linux/qed/qed_chain.h: In function 'qed_chain_init':
   include/linux/qed/qed_chain.h:20:58: warning: right shift count >= width of 
type [-Wshift-count-overflow]
#define DMA_HI(x)   ((u32)(((dma_addr_t)(x)) >> 32))
 ^
   include/uapi/linux/byteorder/little_endian.h:32:51: note: in definition of 
macro '__cpu_to_le32'
#define __cpu_to_le32(x) ((__force __le32)(__u32)(x))
  ^
   include/linux/qed/qed_chain.h:23:45: note: in expansion of macro 'DMA_HI'
#define DMA_HI_LE(x)cpu_to_le32(DMA_HI(x))
^
   include/linux/qed/qed_chain.h:420:27: note: in expansion of macro 'DMA_HI_LE'
   p_next->next_phys.hi = DMA_HI_LE(p_phys_addr);
  ^
   include/linux/qed/qed_chain.h:20:58: warning: right shift count >= width of 
type [-Wshift-count-overflow]
#define DMA_HI(x)   ((u32)(((dma_addr_t)(x)) >> 32))
 ^
   include/uapi/linux/byteorder/little_endian.h:32:51: note: in definition of 
macro '__cpu_to_le32'
#define __cpu_to_le32(x) ((__force __le32)(__u32)(x))
  ^
   include/linux/qed/qed_chain.h:23:45: note: in expansion of macro 'DMA_HI'
#define DMA_HI_LE(x)cpu_to_le32(DMA_HI(x))
^
   include/linux/qed/qed_chain.h:436:26: note: in expansion of macro 'DMA_HI_LE'
  p_next->next_phys.hi = DMA_HI_LE(p_chain->p_phys_addr);
 ^
   drivers/net/ethernet/qlogic/qed/qed_l2.c: In function 
'qed_sp_eth_rxq_start_ramrod':
   include/linux/qed/qed_chain.h:20:58: warning: right shift count >= width of 
type [-Wshift-count-overflow]
#define DMA_HI(x)   ((u32)(((dma_addr_t)(x)) >> 32))
 ^
   include/uapi/linux/byteorder/little_endian.h:32:51: note: in definition of 
macro '__cpu_to_le32'
#define __cpu_to_le32(x) ((__force __le32)(__u32)(x))
  ^
   include/linux/qed/qed_chain.h:23:45: note: in expansion of macro 'DMA_HI'
#define DMA_HI_LE(x)cpu_to_le32(DMA_HI(x))
^
>> drivers/net/ethernet/qlogic/qed/qed_l2.c:576:25: note: in expansion of macro 
>> 'DMA_HI_LE'
 p_ramrod->bd_base.hi = DMA_HI_LE(bd_chain_phys_addr);
^
   include/linux/qed/qed_chain.h:20:58: warning: right shift count >= width of 
type [-Wshift-count-overflow]
#define DMA_HI(x)   ((u32)(((dma_addr_t)(x)) >> 32))
 ^
   include/uapi/linux/byteorder/little_endian.h:32:51: note: in definition of 
macro '__cpu_to_le32'
#define __cpu_to_le32(x) ((__force __le32)(__u32)(x))
  ^
   include/linux/qed/qed_chain.h:23:45: note: in expansion of macro 'DMA_HI'
#define DMA_HI_LE(x)cpu_to_le32(DMA_HI(x))
^
   drivers/net/ethernet/qlogic/qed/qed_l2.c:580:30: note: in expansion of macro 
'DMA_HI_LE'
 p_ramrod->cqe_pbl_addr.hi = DMA_HI_LE(cqe_pbl_addr);
 ^
   drivers/net/ethernet/qlogic/qed/qed_l2.c: In function 
'qed_sp_eth_txq_start_ramrod':
   include/linux/qed/qed_chain.h:20:58: warning: right shift count >= width of 
type [-Wshift-count-overflow]
#define DMA_HI(x)   ((u32)(((dma_addr_t)(x)) >> 32))
 ^
   include/uapi/linux/byteorder/little_endian.h:32:51: note: in definition of 
macro '__cpu_to_le32'
#define __cpu_to_le32(x) ((__force __le32)(__u32)(x))
  ^
   include/linux/qed/qed_chain.h:23:45: note: in expansion of macro 'DMA_HI'
#define DMA_HI_LE(x)cpu_to_le32(DMA_HI(x))
^
   drivers/net/ethernet/qlogic/qed/qed_l2.c:751:31: note: in expansion of macro 
'DMA_HI_LE'
 p_ramrod->pbl_base_addr.hi = DMA_HI_LE(pbl_addr);
  ^

vim +/DMA_HI_LE +576 drivers/net/ethernet/qlogic/qed/qed_l2.c

   560   NULL);
   561

[PATCH net-next 2/3] RDS-TCP: Do not bloat sndbuf/rcvbuf in rds_tcp_tune

2015-09-30 Thread Sowmini Varadhan

Using the value of RDS_TCP_DEFAULT_BUFSIZE (128K)
clobbers efficient use of TSO because it inflates the size_goal
that is computed in tcp_sendmsg/tcp_sendpage and skews packet
latency, and the default values for these parameters actually
results in significantly better performance.

In request-response tests using rds-stress with a packet size of
100K with 16 threads (test parameters -q 10 -a 256 -t16 -d16)
between a single pair of IP addresses achieves a throughput of
6-8 Gbps. Without this patch, throughput maxes at 2-3 Gbps under
equivalent conditions on these platforms.

Signed-off-by: Sowmini Varadhan 
---
 net/rds/tcp.c |   16 
 1 files changed, 4 insertions(+), 12 deletions(-)

diff --git a/net/rds/tcp.c b/net/rds/tcp.c
index c42b60b..9d6ddba 100644
--- a/net/rds/tcp.c
+++ b/net/rds/tcp.c
@@ -67,21 +67,13 @@ void rds_tcp_nonagle(struct socket *sock)
set_fs(oldfs);
 }
 
+/* All module specific customizations to the RDS-TCP socket should be done in
+ * rds_tcp_tune() and applied after socket creation. In general these
+ * customizations should be tunable via module_param()
+ */
 void rds_tcp_tune(struct socket *sock)
 {
-   struct sock *sk = sock->sk;
-
rds_tcp_nonagle(sock);
-
-   /*
-* We're trying to saturate gigabit with the default,
-* see svc_sock_setbufsize().
-*/
-   lock_sock(sk);
-   sk->sk_sndbuf = RDS_TCP_DEFAULT_BUFSIZE;
-   sk->sk_rcvbuf = RDS_TCP_DEFAULT_BUFSIZE;
-   sk->sk_userlocks |= SOCK_SNDBUF_LOCK|SOCK_RCVBUF_LOCK;
-   release_sock(sk);
 }
 
 u32 rds_tcp_snd_nxt(struct rds_tcp_connection *tc)
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next 0/3] RDS: RDS-TCP perf enhancements

2015-09-30 Thread Sowmini Varadhan

A 3-part patchset that (a) improves current RDS-TCP perf
by 2X-3X and (b) refactors earlier robustness code for
better observability/scaling.

Patch 1 is an enhancment of earlier robustness fixes 
that had used separate sockets for client and server endpoints to
resolve race conditions. It is possible to have an equivalent
solution that does not use 2 sockets. The benefit of a
single socket solution is that it results in more predictable
and observable behavior for the underlying TCP pipe of an 
RDS connection

Patches 2 and 3 are simple, straightforward perf bug fixes
that align the RDS TCP socket with other parts of the kernel stack.

Sowmini Varadhan (3):
  Use a single TCP socket for both send and receive.
  Do not bloat sndbuf/rcvbuf in rds_tcp_tune
  Set up MSG_MORE and MSG_SENDPAGE_NOTLAST as appropriate in
rds_tcp_xmit

 net/rds/connection.c |   22 ++
 net/rds/rds.h|4 +++-
 net/rds/tcp.c|   16 
 net/rds/tcp_listen.c |   19 +++
 net/rds/tcp_send.c   |8 +++-
 5 files changed, 27 insertions(+), 42 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next 3/3] RDS-TCP: Set up MSG_MORE and MSG_SENDPAGE_NOTLAST as appropriate in rds_tcp_xmit

2015-09-30 Thread Sowmini Varadhan

For the same reasons as 2f53384424 and 35f9c09fe9, rds_tcp_xmit
may have multiple pages to send, so use the MSG_MORE and
MSG_SENDPAGE_NOTLAST as hints to tcp_sendpage()

Signed-off-by: Sowmini Varadhan 
---
 net/rds/tcp_send.c |8 +++-
 1 files changed, 7 insertions(+), 1 deletions(-)

diff --git a/net/rds/tcp_send.c b/net/rds/tcp_send.c
index 53b17ca..5f3e3fa 100644
--- a/net/rds/tcp_send.c
+++ b/net/rds/tcp_send.c
@@ -83,6 +83,7 @@ int rds_tcp_xmit(struct rds_connection *conn, struct 
rds_message *rm,
struct rds_tcp_connection *tc = conn->c_transport_data;
int done = 0;
int ret = 0;
+   int more;
 
if (hdr_off == 0) {
/*
@@ -116,12 +117,15 @@ int rds_tcp_xmit(struct rds_connection *conn, struct 
rds_message *rm,
goto out;
}
 
+   more = (rm->data.op_nents > 1 ? (MSG_MORE | MSG_SENDPAGE_NOTLAST) : 0);
while (sg < rm->data.op_nents) {
+   int flags = (MSG_DONTWAIT | MSG_NOSIGNAL | more);
+
ret = tc->t_sock->ops->sendpage(tc->t_sock,
sg_page(&rm->data.op_sg[sg]),
rm->data.op_sg[sg].offset + off,
rm->data.op_sg[sg].length - off,
-   MSG_DONTWAIT|MSG_NOSIGNAL);
+   flags);
rdsdebug("tcp sendpage %p:%u:%u ret %d\n", (void 
*)sg_page(&rm->data.op_sg[sg]),
 rm->data.op_sg[sg].offset + off, 
rm->data.op_sg[sg].length - off,
 ret);
@@ -134,6 +138,8 @@ int rds_tcp_xmit(struct rds_connection *conn, struct 
rds_message *rm,
off = 0;
sg++;
}
+   if (sg == rm->data.op_nents - 1)
+   more = 0;
}
 
 out:
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next 1/3] net/rds: Use a single TCP socket for both send and receive.

2015-09-30 Thread Sowmini Varadhan

Commit f711a6ae062c ("net/rds: RDS-TCP: Always create a new rds_sock
for an incoming connection.") modified rds-tcp so that an incoming SYN
would ignore an existing "client" TCP connection which had the local
port set to the transient port.  The motivation for ignoring the existing
"client" connection in f711a6ae was to avoid race conditions and an
endless duel of reconnect attempts triggered by a restart/abort of one
of the nodes in the TCP connection.

However, having separate sockets for active and passive sides
is avoidable, and the simpler model of a single TCP socket for
both send and receives of all RDS connections associated with
that tcp socket makes for easier observability. We avoid the race
conditions from f711a6ae by attempting reconnects in rds_conn_shutdown
if, and only if, the (new) c_outgoing bit is set for RDS_TRANS_TCP.
The c_outgoing bit is initialized in __rds_conn_create().

A side-effect of re-using the client rds_connection for an incoming
SYN is the potential of encountering duelling SYNs, i.e., we
have an outgoing RDS_CONN_CONNECTING socket when we get the incoming
SYN. The logic to arbitrate this criss-crossing SYN exchange in
rds_tcp_accept_one() has been modified to emulate the BGP state
machine: the smaller IP address should back off from the connection attempt.

Signed-off-by: Sowmini Varadhan 
---
 net/rds/connection.c |   22 ++
 net/rds/rds.h|4 +++-
 net/rds/tcp_listen.c |   19 +++
 3 files changed, 16 insertions(+), 29 deletions(-)

diff --git a/net/rds/connection.c b/net/rds/connection.c
index 49adeef..d456403 100644
--- a/net/rds/connection.c
+++ b/net/rds/connection.c
@@ -128,10 +128,7 @@ static struct rds_connection *__rds_conn_create(struct net 
*net,
struct rds_transport *loop_trans;
unsigned long flags;
int ret;
-   struct rds_transport *otrans = trans;
 
-   if (!is_outgoing && otrans->t_type == RDS_TRANS_TCP)
-   goto new_conn;
rcu_read_lock();
conn = rds_conn_lookup(net, head, laddr, faddr, trans);
if (conn && conn->c_loopback && conn->c_trans != &rds_loop_transport &&
@@ -147,7 +144,6 @@ static struct rds_connection *__rds_conn_create(struct net 
*net,
if (conn)
goto out;
 
-new_conn:
conn = kmem_cache_zalloc(rds_conn_slab, gfp);
if (!conn) {
conn = ERR_PTR(-ENOMEM);
@@ -207,6 +203,7 @@ static struct rds_connection *__rds_conn_create(struct net 
*net,
 
atomic_set(&conn->c_state, RDS_CONN_DOWN);
conn->c_send_gen = 0;
+   conn->c_outgoing = (is_outgoing ? 1 : 0);
conn->c_reconnect_jiffies = 0;
INIT_DELAYED_WORK(&conn->c_send_w, rds_send_worker);
INIT_DELAYED_WORK(&conn->c_recv_w, rds_recv_worker);
@@ -243,22 +240,13 @@ static struct rds_connection *__rds_conn_create(struct 
net *net,
/* Creating normal conn */
struct rds_connection *found;
 
-   if (!is_outgoing && otrans->t_type == RDS_TRANS_TCP)
-   found = NULL;
-   else
-   found = rds_conn_lookup(net, head, laddr, faddr, trans);
+   found = rds_conn_lookup(net, head, laddr, faddr, trans);
if (found) {
trans->conn_free(conn->c_transport_data);
kmem_cache_free(rds_conn_slab, conn);
conn = found;
} else {
-   if ((is_outgoing && otrans->t_type == RDS_TRANS_TCP) ||
-   (otrans->t_type != RDS_TRANS_TCP)) {
-   /* Only the active side should be added to
-* reconnect list for TCP.
-*/
-   hlist_add_head_rcu(&conn->c_hash_node, head);
-   }
+   hlist_add_head_rcu(&conn->c_hash_node, head);
rds_cong_add_conn(conn);
rds_conn_count++;
}
@@ -337,7 +325,9 @@ void rds_conn_shutdown(struct rds_connection *conn)
rcu_read_lock();
if (!hlist_unhashed(&conn->c_hash_node)) {
rcu_read_unlock();
-   rds_queue_reconnect(conn);
+   if (conn->c_trans->t_type != RDS_TRANS_TCP ||
+   conn->c_outgoing == 1)
+   rds_queue_reconnect(conn);
} else {
rcu_read_unlock();
}
diff --git a/net/rds/rds.h b/net/rds/rds.h
index afb4048..b4c7ac0 100644
--- a/net/rds/rds.h
+++ b/net/rds/rds.h
@@ -86,7 +86,9 @@ struct rds_connection {
struct hlist_node   c_hash_node;
__be32  c_laddr;
__be32  c_faddr;
-   unsigned intc_loopback:1;
+   unsigned intc_loopback:1,
+   c_outgoing:1,
+   c_pad_to_32:30;

[PATCH] qede: fix simple_return.cocci warnings

2015-09-30 Thread kbuild test robot

drivers/net/ethernet/qlogic/qede/qede_main.c:2141:1-3: WARNING: end returns can 
be simpified and declaration on line 2134 can be dropped

 Simplify a trivial if-return sequence.  Possibly combine with a
 preceding function call.

Generated by: scripts/coccinelle/misc/simple_return.cocci

CC: Sudarsana Kalluru 
Signed-off-by: Fengguang Wu 
---

 qede_main.c |7 +--
 1 file changed, 1 insertion(+), 6 deletions(-)

--- a/drivers/net/ethernet/qlogic/qede/qede_main.c
+++ b/drivers/net/ethernet/qlogic/qede/qede_main.c
@@ -2131,18 +2131,13 @@ err0:
 /* called with rtnl_lock */
 static int qede_open(struct net_device *ndev)
 {
-   int rc;
struct qede_dev *edev = netdev_priv(ndev);
 
netif_carrier_off(ndev);
 
edev->ops->common->set_power_state(edev->cdev, PCI_D0);
 
-   rc = qede_load(edev, QEDE_LOAD_NORMAL);
-   if (rc)
-   return rc;
-
-   return 0;
+   return qede_load(edev, QEDE_LOAD_NORMAL);
 }
 
 static int qede_close(struct net_device *ndev)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

1 2 3 4 >

1 - 100 of 308 matches

Mail list logo