date:20170209

net: hix5hd2_gmac uninitialized net_device

2017-02-09 Thread Marty Plummer

On Fri, Feb 10, 2017 at 01:41:18AM -0600, Marty Plummer wrote:
> Greetings.
> 
> I think I may have found a bug with the hix5hd2_gmac driver; unless I'm
> missing something, it appears that somehow the net_device struct is not
> being initialized properly in the hix5hd2_dev_probe function.
> 
> Having set up my devicetree properly (I hope, still new to this), I first
> recieved an error when inserting the module:
> "(unnamed net_device) (uninitialized): No irq resource"
> while I very clearly have the interrupts property defined within this node.
> 
> Removing the phy-handle node for testing purposes, I get a similar message:
> "(unnamed net_device) (uninitialized): not find phy-handle"
> 
> So, it seams to my (admittedly inexperienced) mind that the ndev pointer is
> not being initialized properly, or that the error checking at line 
> is not functioning properly either, for it to have gotten so far along
> into the function, only to fail at the attempt to access the ndev pointer.
> 
> If you require more information from me, please let me know.
> 
> Marty

Sorry, forgot the subject. Still getting the hang of mutt.

[no subject]

2017-02-09 Thread Marty Plummer

Greetings.

I think I may have found a bug with the hix5hd2_gmac driver; unless I'm
missing something, it appears that somehow the net_device struct is not
being initialized properly in the hix5hd2_dev_probe function.

Having set up my devicetree properly (I hope, still new to this), I first
recieved an error when inserting the module:
"(unnamed net_device) (uninitialized): No irq resource"
while I very clearly have the interrupts property defined within this node.

Removing the phy-handle node for testing purposes, I get a similar message:
"(unnamed net_device) (uninitialized): not find phy-handle"

So, it seams to my (admittedly inexperienced) mind that the ndev pointer is
not being initialized properly, or that the error checking at line 
is not functioning properly either, for it to have gotten so far along
into the function, only to fail at the attempt to access the ndev pointer.

If you require more information from me, please let me know.

Marty

Re: [PATCH RFC ipsec-next 5/5] esp: Add a software GRO codepath

2017-02-09 Thread Steffen Klassert

On Wed, Feb 08, 2017 at 01:19:56PM -0500, David Miller wrote:
> From: Steffen Klassert 
> Date: Tue, 7 Feb 2017 10:14:11 +0100
> 
> > +static struct sk_buff **esp4_gro_receive(struct sk_buff **head,
> > +struct sk_buff *skb)
> > +{
> > +   int err;
> > +   __be32 seq;
> > +   __be32 spi;
> > +   struct xfrm_state *x;
> > +   int offset = skb_gro_offset(skb);
> 
> Please order local variable declarations from longest to shortest.

No problem, will do.

Thanks for looking over it!

Re: [PATCH ipsec-next 0/6] xfrm: policy: make policy backend const

2017-02-09 Thread Steffen Klassert

On Tue, Feb 07, 2017 at 03:00:13PM +0100, Florian Westphal wrote:
> Hi Steffen,
> 
> This series removes all places that need to write to the afinfo policy
> backends.  This then allows us to make the structures const.

All applied to ipsec-next, thanks a lot Florian!

Re: [PATCH 1/3] ath10k: remove ath10k_vif_to_arvif()

2017-02-09 Thread Joe Perches

On Thu, 2017-02-09 at 23:14 -0800, Adrian Chadd wrote:

> If there
> were accessors for the skb data / len fields (like we do for mbufs)
> then porting the code would've involved about 5,000 less changed
> lines.

What generic mechanisms would you suggest to make
porting easier between bsd and linux and what in
your opinion are the best naming schemes to make
these functions easiest to read and implement
without resorting to excessive identifier lengths?

If you have some, please provide examples.

Re: [PATCH ipsec-next] xfrm: input: constify xfrm_input_afinfo

2017-02-09 Thread Steffen Klassert

On Tue, Feb 07, 2017 at 02:52:30PM +0100, Florian Westphal wrote:
> Nothing writes to these structures (the module owner was not used).
> 
> While at it, size xfrm_input_afinfo[] by the highest existing xfrm family
> (INET6), not AF_MAX.
> 
> Signed-off-by: Florian Westphal 

Applied to ipsec-next, thanks!

Re: [PATCH ipsec] xfrm: policy: init locks early

2017-02-09 Thread Steffen Klassert

On Wed, Feb 08, 2017 at 11:52:29AM +0100, Florian Westphal wrote:
> Dmitry reports following splat:
>  INFO: trying to register non-static key.
>  the code is fine but needs lockdep annotation.
>  turning off the locking correctness validator.
>  CPU: 0 PID: 13059 Comm: syz-executor1 Not tainted 4.10.0-rc7-next-20170207 #1
> [..]
>  spin_lock_bh include/linux/spinlock.h:304 [inline]
>  xfrm_policy_flush+0x32/0x470 net/xfrm/xfrm_policy.c:963
>  xfrm_policy_fini+0xbf/0x560 net/xfrm/xfrm_policy.c:3041
>  xfrm_net_init+0x79f/0x9e0 net/xfrm/xfrm_policy.c:3091
>  ops_init+0x10a/0x530 net/core/net_namespace.c:115
>  setup_net+0x2ed/0x690 net/core/net_namespace.c:291
>  copy_net_ns+0x26c/0x530 net/core/net_namespace.c:396
>  create_new_namespaces+0x409/0x860 kernel/nsproxy.c:106
>  unshare_nsproxy_namespaces+0xae/0x1e0 kernel/nsproxy.c:205
>  SYSC_unshare kernel/fork.c:2281 [inline]
> 
> Problem is that when we get error during xfrm_net_init we will call
> xfrm_policy_fini which will acquire xfrm_policy_lock before it was
> initialized.  Just move it around so locks get set up first.
> 
> Reported-by: Dmitry Vyukov 
> Fixes: 283bc9f35bbbcb0e9 ("xfrm: Namespacify xfrm state/policy locks")
> Signed-off-by: Florian Westphal 

Applied, thanks everyone!

Re: net/packet: use-after-free in packet_rcv_fanout

2017-02-09 Thread Eric Dumazet

On Thu, Feb 9, 2017 at 7:33 PM, Sowmini Varadhan
 wrote:
> On (02/09/17 19:19), Eric Dumazet wrote:
>>
>> More likely the bug is in fanout_add(), with a buggy sequence in error
>> case, and not correct locking.
>>
>> kfree(po->rollover);
>> po->rollover = NULL;
>>
>> Two cpus entering fanout_add() (using the same af_packet socket,
>> syzkaller courtesy...) might both see po->fanout being NULL.
>>
>> Then they grab the mutex.  Too late...
>
> I'm not sure I follow- aiui the panic was in acceessing the
> sk_receive_queue.lock in a socket that had been closed earlier. I think
> the assumption is that rcu_read_lock_bh in __dev_queue_xmit (and
> rcu_read_lock in dev_queue_xmit_nit?) should make sure that the nit
> packet delivery can be done safely, and the synchronize_net in
> packet_release() makes sure that the Tx paths are quiesced before freeing
> the socket.  What is the race-hole here? Does it have to do with the
> _bh and softirq context, somehow?
>

We have probably a dozen of bugs to fix in af_packet.c

The race in fanout_add() is one ot theml.

I do not believe Anoob Soman sent his fixes btw ...

( Look for this thread : http://marc.info/?l=linux-netdev=148588680525648=2

linux-next: manual merge of the akpm tree with the net tree

2017-02-09 Thread Stephen Rothwell

Hi Andrew,

Today's linux-next merge of the akpm tree got a conflict in:

  drivers/net/usb/sierra_net.c

between commit:

  5a70348e1187 ("sierra_net: Add support for IPv6 and Dual-Stack Link Sense 
Indications")

from the net tree and patch:

  "lib/vsprintf.c: remove %Z support"

from the akpm tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc drivers/net/usb/sierra_net.c
index d9440bc022f2,88ace5024306..
--- a/drivers/net/usb/sierra_net.c
+++ b/drivers/net/usb/sierra_net.c
@@@ -376,11 -349,11 +376,11 @@@ static inline int sierra_net_is_valid_a
  static int sierra_net_parse_lsi(struct usbnet *dev, char *data, int datalen)
  {
struct lsi_umts *lsi = (struct lsi_umts *)data;
 +  u32 expected_length;
  
 -  if (datalen < sizeof(struct lsi_umts)) {
 -  netdev_err(dev->net, "%s: Data length %d, exp %zu\n",
 -  __func__, datalen,
 -  sizeof(struct lsi_umts));
 +  if (datalen < sizeof(struct lsi_umts_single)) {
-   netdev_err(dev->net, "%s: Data length %d, exp >= %Zu\n",
++  netdev_err(dev->net, "%s: Data length %d, exp >= %zu\n",
 + __func__, datalen, sizeof(struct lsi_umts_single));
return -1;
}

Re: [PATCH 1/3] ath10k: remove ath10k_vif_to_arvif()

2017-02-09 Thread Adrian Chadd

On 9 February 2017 at 23:03, Valo, Kalle  wrote:
> Ben Greear  writes:
>
>> On 02/07/2017 01:14 AM, Valo, Kalle wrote:
>>> Adrian Chadd  writes:
>>>
 Removing this method makes the diff to FreeBSD larger, as "vif" in
 FreeBSD is a different pointer.

 (Yes, I have ath10k on freebsd working and I'd like to find a way to
 reduce the diff moving forward.)
>>>
>>> I don't like this "(void *) vif->drv_priv" style that much either but
>>> apparently it's commonly used in Linux wireless code and already parts
>>> of ath10k. So this patch just unifies the coding style.
>>
>> Surely the code compiles to the same thing, so why add a patch that
>> makes it more difficult for Adrian and makes the code no easier to read
>> for the rest of us?
>
> Because that's the coding style used already in Linux. It's great to see
> that parts of ath10k can be used also in other systems but in principle
> I'm not very fond of the idea starting to reject valid upstream patches
> because of driver forks.
>
> I think backports project is doing it right, it's not limiting upstream
> development in any way and handles all the API changes internally. Maybe
> FreeBSD could do something similar?

I tried, but ... well, imagine renaming vif->drv_priv to something
else. That's what you're suggesting. :-) You can do it with
coccinelle, but not via just backports API implementations. I'm a big
fan of light weight accessor APIs for the same reason.

(Since FreeBSD doesn't have that pointer in ieee80211vap, it's done a
different way.)

If you could convert other direct uses over to ath10k_vif_to_arvif()
then that'd make me happier. If not, it's fine, when I push this into
freebsd and fast-forward commits, I'll have to just maintain it.

For what it's worth - the linux skb accessors are the same. If there
were accessors for the skb data / len fields (like we do for mbufs)
then porting the code would've involved about 5,000 less changed
lines.

-adrian

Re: [PATCH 1/3] ath10k: remove ath10k_vif_to_arvif()

2017-02-09 Thread Valo, Kalle

Ben Greear  writes:

> On 02/07/2017 01:14 AM, Valo, Kalle wrote:
>> Adrian Chadd  writes:
>>
>>> Removing this method makes the diff to FreeBSD larger, as "vif" in
>>> FreeBSD is a different pointer.
>>>
>>> (Yes, I have ath10k on freebsd working and I'd like to find a way to
>>> reduce the diff moving forward.)
>>
>> I don't like this "(void *) vif->drv_priv" style that much either but
>> apparently it's commonly used in Linux wireless code and already parts
>> of ath10k. So this patch just unifies the coding style.
>
> Surely the code compiles to the same thing, so why add a patch that
> makes it more difficult for Adrian and makes the code no easier to read
> for the rest of us?

Because that's the coding style used already in Linux. It's great to see
that parts of ath10k can be used also in other systems but in principle
I'm not very fond of the idea starting to reject valid upstream patches
because of driver forks.

I think backports project is doing it right, it's not limiting upstream
development in any way and handles all the API changes internally. Maybe
FreeBSD could do something similar?

-- 
Kalle Valo

Re: [PATCH RFC v2 1/8] xdp: Infrastructure to generalize XDP

2017-02-09 Thread Jason Wang

On 2017年02月10日 10:30, Tom Herbert wrote:

On Thu, Feb 9, 2017 at 5:48 PM, David Miller  wrote:

From: Tom Herbert 
Date: Thu, 9 Feb 2017 15:08:22 -0800

Okay, how about this... I'll add a configuration option like
XDP_ALLOW_OTHER_HOOKS. The default will be to disallow setting any
hook other than a BPF. If it is set, then we'll accept other hooks
to be run. This way mostly restrict the interface by default, but
still allow experimentation with other hook types like I need with
TXDP or maybe the netfilter guys might want to fastpath netfilter
etc. When we we bring a working robust implementation to netdev that
show clear benefits then we can add those to BPF as the "allowed"
hooks at that time. So this strictly controls the interfaces, but
still also allows room for innovation.

Anyone is allowed to "innovate" in their own private kernel tree.

But I'm not unleashing that upstream.

The only reason I accepted XDP is entirely because it is limited
in scope to eBPF.  All eBPF programs execute in finite time,
cannot loop, cannot deadlock, cannot access arbitrary pieces
of kernel memory and datastructures.

It is a well defined, constrained, and incredibly tightly controlled
execution environment for implementing policy, monitoring and control.

And it's also incredibly invasive in the core data path of drivers.
TBH it is not clear to me that the narrow use cases for XDP justifies
adding this complexity being added to every driver.

XDP is valuable for fast userspace forwarding (e.g macvtap passthrough 
mode). I hope we can leave a window for this. Or we may need introduce 
other similar hooks.

Thanks

In any case, I withdraw the patch set.

Tom

[PATCH net 1/1] net: fec: fix multicast filtering hardware setup

2017-02-09 Thread Andy Duan

Fix hardware setup of multicast address hash:
- Never clear the hardware hash (to avoid packet loss)
- Construct the hash register values in software and then write once
to hardware

Signed-off-by: Fugang Duan 
Signed-off-by: Rui Sousa 
---
 drivers/net/ethernet/freescale/fec_main.c | 23 +--
 1 file changed, 9 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/freescale/fec_main.c 
b/drivers/net/ethernet/freescale/fec_main.c
index 2cc552d..91a1664 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -2910,6 +2910,7 @@ static void set_multicast_list(struct net_device *ndev)
struct netdev_hw_addr *ha;
unsigned int i, bit, data, crc, tmp;
unsigned char hash;
+   unsigned int hash_high = 0, hash_low = 0;
 
if (ndev->flags & IFF_PROMISC) {
tmp = readl(fep->hwp + FEC_R_CNTRL);
@@ -2932,11 +2933,7 @@ static void set_multicast_list(struct net_device *ndev)
return;
}
 
-   /* Clear filter and add the addresses in hash register
-*/
-   writel(0, fep->hwp + FEC_GRP_HASH_TABLE_HIGH);
-   writel(0, fep->hwp + FEC_GRP_HASH_TABLE_LOW);
-
+   /* Add the addresses in hash register */
netdev_for_each_mc_addr(ha, ndev) {
/* calculate crc32 value of mac address */
crc = 0x;
@@ -2954,16 +2951,14 @@ static void set_multicast_list(struct net_device *ndev)
 */
hash = (crc >> (32 - FEC_HASH_BITS)) & 0x3f;
 
-   if (hash > 31) {
-   tmp = readl(fep->hwp + FEC_GRP_HASH_TABLE_HIGH);
-   tmp |= 1 << (hash - 32);
-   writel(tmp, fep->hwp + FEC_GRP_HASH_TABLE_HIGH);
-   } else {
-   tmp = readl(fep->hwp + FEC_GRP_HASH_TABLE_LOW);
-   tmp |= 1 << hash;
-   writel(tmp, fep->hwp + FEC_GRP_HASH_TABLE_LOW);
-   }
+   if (hash > 31)
+   hash_high |= 1 << (hash - 32);
+   else
+   hash_low |= 1 << hash;
}
+
+   writel(hash_high, fep->hwp + FEC_GRP_HASH_TABLE_HIGH);
+   writel(hash_low, fep->hwp + FEC_GRP_HASH_TABLE_LOW);
 }
 
 /* Set a MAC change in hardware. */
-- 
1.9.1

Re: loopback device reference count leakage

2017-02-09 Thread Kaiwen Xu

I am using macvlan device inside the container. With following Docker
network plugin:

https://github.com/gopher-net/macvlan-docker-plugin

Each macvlan device, which gets assigned into the container network
namespace, is attached to host's vlan device, which is then attached to
host's eth0.

eth0  <==  eth0.1000  <==  macvlan0 (host macvlan device)
  \==  macvlan1 (container macvlan device)
  \==  macvlan2 (container macvlan device)
  ...

eth0 has a 10.x.x.x/24 IP address. eth0.1000 is able to use any of the
addresses in another 10.x.x.y/24 range (different from the /24 assigned to
eth0), but itself isn't directly assigned an IP address. macvlan0, which
is on the host, is assigned an IP address in the 10.x.x.y/24 range that
belongs to eth0.1000. When container start up, a new macvlan device is
created attaching to eth0.1000 with a different 10.x.x.y/24 address,
which is assigned into the container network namespace. The container's
10.x.x.y/24 address is directly reachable outside of the host.

Thanks,
Kaiwen

On Wed, Feb 08, 2017 at 01:50:57PM -0800, Cong Wang wrote:
> On Mon, Feb 6, 2017 at 6:32 PM, Kaiwen Xu  wrote:
> > Hi Cong,
> >
> > I did some more testing, seems like your second assumption is correct.
> > There is indeed some things holding the references to a particular dst
> > which preventing it to be gc'ed.
> 
> Excellent!
> 
> >
> > I added logging to each dst_hold (or dst_hold_safe, or
> > skb_dst_force_safe) and dst_release, which formatted as following:
> >
> >  () []: dst_release / dst_hold ...  
> > 
> >
> > And inside dst_gc_task(), I added logging when gc delay occurred,
> > formatted as:
> >
> > [dst_gc_task]  (): delayed 
> >
> > I have the log attached.
> 
> The following line looks suspicious:
> 
> Feb  6 16:27:24  kernel: [63589.458067] [dst_gc_task]
> lodebug (2): delayed 19
> 
> Looks like you ended up having one dst whose refcnt is 19 in GC,
> and this lasted for a rather long time for some reason.
> 
> It is hard to know if it is a refcnt leak even with your log, since there were
> 4K+ refcnt'ing happened on that dst...
> 
> Meanwhile, can you share your setup of your container? What network device
> do you use in your container? How is it connected to outside?
> 
> Thanks.

Re: [PATCH RFC v2 1/8] xdp: Infrastructure to generalize XDP

2017-02-09 Thread Tom Herbert

On Thu, Feb 9, 2017 at 7:33 PM, David Miller  wrote:
> From: Tom Herbert 
> Date: Thu, 9 Feb 2017 18:29:54 -0800
>
>> So we have thousands or LOC coming into drivers every day anyway with
>> all those properties anyway, so this "restricted" environment solves
>> at best 1% of the problem.
>
> What you must understand is that no matter what someone outside of
> upstream writes into an eBPF program, it's safe, and we can absolutely
> prove this with the verifier and the invariants of the execution
> environment.
>
This is the exact same argument the userspace stack proponents will
use-- put your stack in userspace and you can't crash the host. But
just like eBPF that does not at all mean the logic of the program is
correct. Getting into a mode where we drop every packet, or checksums
are miscomputed, or a protocol field is miswritten is entirely
possible. The value of coding in the Linux kernel, maybe the only
truly relevant point compared to userspace stacks, is the scrutiny,
the testing, the debugging, and the eyes of experts we get to look at
every line going into the kernel to avoid such problems. Even though
there's the possibility of crash or deadloacking the system, I would
absolutely put the quality of kernel code over _any_ piece of
userspace code _any_ day of the week. Maybe some day we'll see a
process for XDP/BPF for reviewing and accepting code and you along
with several of established experts on netdev will be earnestly
reviewing such code, but until then I am more inclined to stick with
writing kernel code for anything other than simple things that are
amenable to BPF. The problem with kernel bypass is not just that it
bypasses the well-written and well-tested kernel code, but that it
also bypasses the process.

Tom

Re: [RFC PATCH net-next 1/2] bpf: Save original ebpf instructions

2017-02-09 Thread Alexei Starovoitov

On Thu, Feb 09, 2017 at 12:25:37PM +0100, Daniel Borkmann wrote:
>
> Correct the overlap both use-cases share is the dump itself. It needs
> to be in such a condition for CRIU, that it can be reloaded eventually,

I don't think it makes sense to drag criu into this discussion.
I expressed my take on criu in the other thread. tldr:
bpf is a graph of dependencies between programs, maps, applications
and kernel events. So to save/restore this graph one would need to solve
very hard problems of stopping multiple applications at once,
stopping kernel events and so on. I don't think it's worth going that route.

> >- Alternatively, the attach is always done by passing the FD as an
> >attribute, so the netlink dump could attach an fd to the running
> >program, return the FD as an attribute and the bpf program is retrieved
> >from the fd. This is a major departure from how dumps work with
> >processing attributes and needing to attach open files to a process will
> >be problematic. Integrating the bpf into the dump is a natural fit.
> 
> Right, I think it's a natural fit to place it into the various points/
> places where it's attached to, as we're stuck with that anyway for the
> attachment part. Meaning in cls_bpf, it would go as a mem blob into the
> netlink attribute. There would need to be a common BPF core helper that
> the various subsystem users call in order to generate that mentioned
> output format, and that resulting mem blob is then stuck into either
> nlattr, mem provided by syscall, etc.

I think if we use ten different ways to dump it, it will
complicate the user space tooling.
I'd rather see one way of doing it via new syscall command.
Pass prog_fd and it will return insns in some form.

Here is more concrete proposal:
- add two flags to PROG_LOAD:
  BPF_F_ENFORCE_STATELESS - it will require verifier to check that program
  doesn't use maps and any other global state (doesn't use bpf_redirect,
  doesn't use bpf_set_tunnel_key and tunnel_opt)
  This will ensure that program is stateless and pure instruction
  dump is meaningful. For 'ip vrf' case it will be enough.
  BPF_F_ALLOW_DUMP - it will save original program, so in the common
  case we wouldn't need to waste memory to save program

- add new bpf syscall command BPF_PROG_DUMP
  input: prog_fd, output: insns
  it will work right away with OBJ_GET command and the user will
  be able to dump stateless programs pinned in bpffs

- add approriate interfaces for different attach points to return prog_fd:
  for cgroup it will be new BPF_PROG_GET command.
  for socket it will be new getsockopt. (Actually BPF_PROG_GET can work
  for sockets too and probably better).
  for xdp and tc we need to find a way to return prog_fd.
  netlink is no good, since it would be very weird to install fd
  and return it async in netlink body. We can simply say that
  whoever wants to dump programs need to first pin them in bpffs
  and then attach to tc/xdp. iproute2 already does it anyway.
  Realistically tc/xdp programs are almost always stateful, so
  dump won't be available for them anyway.

If in the future we will discover magic way of restoring maps,
we can relax prog loading part and allow BPF_F_ALLOW_DUMP to be
used independently of BPF_F_ENFORCE_STATELESS flag.
(In the beginning these two flags would need to be used together).
Also we'll be able to extend BPF_PROG_DUMP independently
of attachment points. If we introduce something like global
variable section or read-only string section (like some folks asked)
we can dump it back. Insns will not be the only section.

My main point is that right now I'd really like to avoid
dealing with stateful bits (maps, etc), since there is no
good way of dumping it, while stateless will be enough
for 'ip vrf' and simple programs.

Thoughts?

Re: net: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected in skb_array_produce

2017-02-09 Thread Jason Wang




On 2017年02月10日 02:10, Michael S. Tsirkin wrote:

On Thu, Feb 09, 2017 at 05:02:31AM -0500, Jason Wang wrote:

- Original Message -

Hello,

I've got the following report while running syzkaller fuzzer on mmotm
(git://git.kernel.org/pub/scm/linux/kernel/git/mhocko/mm.git)
remotes/mmotm/auto-latest ee4ba7533626ba7bf2f8b992266467ac9fdc045e:


[...]


other info that might help us debug this:

  Possible interrupt unsafe locking scenario:

CPU0CPU1

   lock(&(>consumer_lock)->rlock);
local_irq_disable();
lock(&(>producer_lock)->rlock);
lock(&(>consumer_lock)->rlock);
   
 lock(&(>producer_lock)->rlock);


Thanks a lot for the testing.

Looks like we could address this by using skb_array_consume_bh() instead.

Could you pls verify if the following patch works?

I think we should use _bh for the produce call as well,
since resizing takes the producer lock.


Looks not since irq was disabled during resizing?

Thanks

Re: net: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected in skb_array_produce

2017-02-09 Thread Jason Wang




On 2017年02月09日 18:49, Dmitry Vyukov wrote:

On Thu, Feb 9, 2017 at 11:02 AM, Jason Wang  wrote:

- Original Message -

Hello,

I've got the following report while running syzkaller fuzzer on mmotm
(git://git.kernel.org/pub/scm/linux/kernel/git/mhocko/mm.git)
remotes/mmotm/auto-latest ee4ba7533626ba7bf2f8b992266467ac9fdc045e:


[...]


other info that might help us debug this:

  Possible interrupt unsafe locking scenario:

CPU0CPU1

   lock(&(>consumer_lock)->rlock);
local_irq_disable();
lock(&(>producer_lock)->rlock);
lock(&(>consumer_lock)->rlock);
   
 lock(&(>producer_lock)->rlock);


Thanks a lot for the testing.

Looks like we could address this by using skb_array_consume_bh() instead.

Could you pls verify if the following patch works?

No, I can't test it, sorry. This happened once on bots. And bots
currently test only upstream versions.




No problem, will try to test my self.

Thanks

Re: [PATCH RFC v2 1/8] xdp: Infrastructure to generalize XDP

2017-02-09 Thread David Miller

From: Tom Herbert 
Date: Thu, 9 Feb 2017 15:08:22 -0800

> Okay, how about this... I'll add a configuration option like
> XDP_ALLOW_OTHER_HOOKS. The default will be to disallow setting any
> hook other than a BPF. If it is set, then we'll accept other hooks
> to be run. This way mostly restrict the interface by default, but
> still allow experimentation with other hook types like I need with
> TXDP or maybe the netfilter guys might want to fastpath netfilter
> etc. When we we bring a working robust implementation to netdev that
> show clear benefits then we can add those to BPF as the "allowed"
> hooks at that time. So this strictly controls the interfaces, but
> still also allows room for innovation.

Anyone is allowed to "innovate" in their own private kernel tree.

But I'm not unleashing that upstream.

The only reason I accepted XDP is entirely because it is limited
in scope to eBPF.  All eBPF programs execute in finite time,
cannot loop, cannot deadlock, cannot access arbitrary pieces
of kernel memory and datastructures.

It is a well defined, constrained, and incredibly tightly controlled
execution environment for implementing policy, monitoring and control.

Re: [PATCH v4 net-next 00/10] openvswitch: Conntrack integration improvements.

2017-02-09 Thread David Miller

From: Jarno Rajahalme 
Date: Thu,  9 Feb 2017 11:21:51 -0800

> This series improves the conntrack integration code in the openvswitch
> module by fixing outdated comments (patch 1), bugs (patches 2, 3, and
> 7), clarifying code (patches 4, 5, and 6), improving performance
> (patch 10), and adding new features enabling better translation from
> firewall admission policy to network configuration requested by user
> communities (patches 8 and 9).
> 
> Please note that v3 of the series was Acked by Pravin, but I posted a
> v4 addressing the remaining english language and coding style issues
> posted by Joe on v2.
> 
> v4: Address remaining language and coding style issues from Joe.
> v3: Rebase to the current net-next, add the comment only changing
> patch 1 and reshuffle some of the patches as requested by Joe.

Ok, I caught this v4 before pushing out v3 by mistake.

Series applied, th anks.

Re: [PATCH v3 net-next 00/10] openvswitch: Conntrack integration improvements.

2017-02-09 Thread David Miller

From: Pravin Shelar 
Date: Thu, 9 Feb 2017 08:44:55 -0800

> On Wed, Feb 8, 2017 at 5:30 PM, Jarno Rajahalme  wrote:
>> This series improves the conntrack integration code in the openvswitch
>> module by fixing outdated comments (patch 1), bugs (patches 2, 3, and
>> 7), clarifying code (patches 4, 5, and 6), improving performance
>> (patch 10), and adding new features enabling better translation from
>> firewall admission policy to network configuration requested by user
>> communities (patches 8 and 9).
>>
>> v3: Rebase to the current net-next, add the comment only changing
>> patch 1 and reshuffle some of the patches as requested by Joe.
>>
> 
> All patches looks good to me.
> 
> Acked-by: Pravin B Shelar 

Series applied.

Re: [PATCH] net: myricom: myri10ge: use new api ethtool_{get|set}_link_ksettings

2017-02-09 Thread Hyong-Youb Kim

On Thu, Feb 09, 2017 at 11:17:23PM +0100, Philippe Reynes wrote:
> The ethtool api {get|set}_settings is deprecated.
> We move this driver to new api {get|set}_link_ksettings.
> 
> As I don't have the hardware, I'd be very pleased if
> someone may test this patch.

Tested using a 2-port NIC. Works fine.

Acked-by: Hyong-Youb Kim 

> 
> Signed-off-by: Philippe Reynes 
> ---
>  drivers/net/ethernet/myricom/myri10ge/myri10ge.c |   23 +
>  1 files changed, 10 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/net/ethernet/myricom/myri10ge/myri10ge.c 
> b/drivers/net/ethernet/myricom/myri10ge/myri10ge.c
> index 1139d18..b171ed2 100644
> --- a/drivers/net/ethernet/myricom/myri10ge/myri10ge.c
> +++ b/drivers/net/ethernet/myricom/myri10ge/myri10ge.c
> @@ -1610,15 +1610,16 @@ static irqreturn_t myri10ge_intr(int irq, void *arg)
>  }
>  
>  static int
> -myri10ge_get_settings(struct net_device *netdev, struct ethtool_cmd *cmd)
> +myri10ge_get_link_ksettings(struct net_device *netdev,
> + struct ethtool_link_ksettings *cmd)
>  {
>   struct myri10ge_priv *mgp = netdev_priv(netdev);
>   char *ptr;
>   int i;
>  
> - cmd->autoneg = AUTONEG_DISABLE;
> - ethtool_cmd_speed_set(cmd, SPEED_1);
> - cmd->duplex = DUPLEX_FULL;
> + cmd->base.autoneg = AUTONEG_DISABLE;
> + cmd->base.speed = SPEED_1;
> + cmd->base.duplex = DUPLEX_FULL;
>  
>   /*
>* parse the product code to deterimine the interface type
> @@ -1643,16 +1644,12 @@ static irqreturn_t myri10ge_intr(int irq, void *arg)
>   ptr++;
>   if (*ptr == 'R' || *ptr == 'Q' || *ptr == 'S') {
>   /* We've found either an XFP, quad ribbon fiber, or SFP+ */
> - cmd->port = PORT_FIBRE;
> - cmd->supported |= SUPPORTED_FIBRE;
> - cmd->advertising |= ADVERTISED_FIBRE;
> + cmd->base.port = PORT_FIBRE;
> + ethtool_link_ksettings_add_link_mode(cmd, supported, FIBRE);
> + ethtool_link_ksettings_add_link_mode(cmd, advertising, FIBRE);
>   } else {
> - cmd->port = PORT_OTHER;
> + cmd->base.port = PORT_OTHER;
>   }
> - if (*ptr == 'R' || *ptr == 'S')
> - cmd->transceiver = XCVR_EXTERNAL;
> - else
> - cmd->transceiver = XCVR_INTERNAL;
>  
>   return 0;
>  }
> @@ -1925,7 +1922,6 @@ static int myri10ge_led(struct myri10ge_priv *mgp, int 
> on)
>  }
>  
>  static const struct ethtool_ops myri10ge_ethtool_ops = {
> - .get_settings = myri10ge_get_settings,
>   .get_drvinfo = myri10ge_get_drvinfo,
>   .get_coalesce = myri10ge_get_coalesce,
>   .set_coalesce = myri10ge_set_coalesce,
> @@ -1939,6 +1935,7 @@ static int myri10ge_led(struct myri10ge_priv *mgp, int 
> on)
>   .set_msglevel = myri10ge_set_msglevel,
>   .get_msglevel = myri10ge_get_msglevel,
>   .set_phys_id = myri10ge_phys_id,
> + .get_link_ksettings = myri10ge_get_link_ksettings,
>  };
>  
>  static int myri10ge_allocate_rings(struct myri10ge_slice_state *ss)
> -- 
> 1.7.4.4
>

Re: net/packet: use-after-free in packet_rcv_fanout

2017-02-09 Thread Sowmini Varadhan

On (02/09/17 19:19), Eric Dumazet wrote:
> 
> More likely the bug is in fanout_add(), with a buggy sequence in error
> case, and not correct locking.
> 
> kfree(po->rollover);
> po->rollover = NULL;
> 
> Two cpus entering fanout_add() (using the same af_packet socket,
> syzkaller courtesy...) might both see po->fanout being NULL.
> 
> Then they grab the mutex.  Too late...

I'm not sure I follow- aiui the panic was in acceessing the
sk_receive_queue.lock in a socket that had been closed earlier. I think
the assumption is that rcu_read_lock_bh in __dev_queue_xmit (and
rcu_read_lock in dev_queue_xmit_nit?) should make sure that the nit
packet delivery can be done safely, and the synchronize_net in
packet_release() makes sure that the Tx paths are quiesced before freeing
the socket.  What is the race-hole here? Does it have to do with the
_bh and softirq context, somehow?

--Sowmini

Re: [PATCH RFC v2 1/8] xdp: Infrastructure to generalize XDP

2017-02-09 Thread David Miller

From: Tom Herbert 
Date: Thu, 9 Feb 2017 18:29:54 -0800

> So we have thousands or LOC coming into drivers every day anyway with
> all those properties anyway, so this "restricted" environment solves
> at best 1% of the problem.

What you must understand is that no matter what someone outside of
upstream writes into an eBPF program, it's safe, and we can absolutely
prove this with the verifier and the invariants of the execution
environment.

Real kernel modules have no such restricted scope.

This is the fundamental issue.

Even if I agreed with you, it's tremendously frustrating that we
haven't even touched the surface of what eBPF XDP can do, and yet
you're openning the floodgates to something we cannot even prove
we need or is required yet.

XDP via eBPF in it's current form needs more work and it needs to be
fully fleshed out and more user friendly.  That's where the effort
and engineering resources belong right now.

After that you can say "Ok, now we have that just about feature
complete, here is the thing that's not possible and that's why we need
X" You think you can answer that right now, and I know that it's not
true.  There is so much that eBPF XDP can do with the right mix of
care and helper functions.  I actually really see no fundamental limit
to what it is capable of doing with the proper design.

Re: [PATCH V4 net-next 00/13] Bug Fixes in ENA driver

2017-02-09 Thread David Miller

From: Netanel Belgazal 
Date: Thu,  9 Feb 2017 15:21:26 +0200

> Changes from V3:
> * Rebase patchset to master and solve merge conflicts.
> * Remove redundant bug fix (fix error handling when probe fails)

Series applied, thank you.

Re: net/packet: use-after-free in packet_rcv_fanout

2017-02-09 Thread Eric Dumazet

On Thu, 2017-02-09 at 19:19 -0800, Eric Dumazet wrote:

> More likely the bug is in fanout_add(), with a buggy sequence in error
> case, and not correct locking.
> 
> kfree(po->rollover);
> po->rollover = NULL;
> 
> Two cpus entering fanout_add() (using the same af_packet socket,
> syzkaller courtesy...) might both see po->fanout being NULL.
> 
> Then they grab the mutex.  Too late...

Patch could be :

 net/packet/af_packet.c |7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 
d56ee46b11fc9524e457e5fe8adf10c105a66ab6..11725a350f6953d077f754c10e9f52e48924d780
 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -1657,7 +1657,6 @@ static int fanout_add(struct sock *sk, u16 id, u16 
type_flags)
atomic_long_set(>rollover->num_failed, 0);
}
 
-   mutex_lock(_mutex);
match = NULL;
list_for_each_entry(f, _list, list) {
if (f->id == id &&
@@ -1704,7 +1703,6 @@ static int fanout_add(struct sock *sk, u16 id, u16 
type_flags)
}
}
 out:
-   mutex_unlock(_mutex);
if (err) {
kfree(po->rollover);
po->rollover = NULL;
@@ -3698,7 +3696,10 @@ packet_setsockopt(struct socket *sock, int level, int 
optname, char __user *optv
if (copy_from_user(, optval, sizeof(val)))
return -EFAULT;
 
-   return fanout_add(sk, val & 0x, val >> 16);
+   mutex_lock(_mutex);
+   ret = fanout_add(sk, val & 0x, val >> 16);
+   mutex_unlock(_mutex);
+   return ret;
}
case PACKET_FANOUT_DATA:
{

Re: net/packet: use-after-free in packet_rcv_fanout

2017-02-09 Thread Eric Dumazet

On Thu, 2017-02-09 at 17:24 -0800, Cong Wang wrote:
> On Thu, Feb 9, 2017 at 5:14 AM, Dmitry Vyukov  wrote:
> > Hello,
> >
> > I've got the following use-after-free report in packet_rcv_fanout
> > while running syzkaller fuzzer on linux-next
> > e3e6c5f3544c5d05c6b3b309a34f4f2c3537e993. So far it happened once and
> > is not reproducible, but maybe the stacks will allow you to figure out
> > what happens.
> >
> > BUG: KASAN: use-after-free in __lock_acquire+0x3212/0x3430
> > kernel/locking/lockdep.c:3224 at addr 8801d903d538
> > Read of size 8 by task syz-executor1/10596
> > CPU: 1 PID: 10596 Comm: syz-executor1 Not tainted 4.10.0-rc7-next-20170208 
> > #1
> > Hardware name: Google Google Compute Engine/Google Compute Engine,
> > BIOS Google 01/01/2011
> >
> > Call Trace:
> >  __asan_report_load8_noabort+0x29/0x30 mm/kasan/report.c:332
> >  __lock_acquire+0x3212/0x3430 kernel/locking/lockdep.c:3224
> >  lock_acquire+0x2a1/0x630 kernel/locking/lockdep.c:3753
> >  __raw_spin_lock_bh include/linux/spinlock_api_smp.h:135 [inline]
> >  _raw_spin_lock_bh+0x3a/0x50 kernel/locking/spinlock.c:175
> >  spin_lock_bh include/linux/spinlock.h:304 [inline]
> >  packet_rcv_has_room+0x25/0xb0 net/packet/af_packet.c:1308
> >  fanout_demux_rollover+0x3bb/0x6b0 net/packet/af_packet.c:1388
> >  packet_rcv_fanout+0x674/0x800 net/packet/af_packet.c:1490
> >  dev_queue_xmit_nit+0x73a/0xa90 net/core/dev.c:1898
> >  xmit_one net/core/dev.c:2870 [inline]
> >  dev_hard_start_xmit+0x16b/0xab0 net/core/dev.c:2890
> >  __dev_queue_xmit+0x16d1/0x1e60 net/core/dev.c:3355
> >  dev_queue_xmit+0x17/0x20 net/core/dev.c:3388
> >  neigh_hh_output include/net/neighbour.h:468 [inline]
> >  dst_neigh_output include/net/dst.h:452 [inline]
> >  ip6_finish_output2+0x1461/0x2380 net/ipv6/ip6_output.c:123
> >  ip6_finish_output+0x2f9/0x950 net/ipv6/ip6_output.c:149
> >  NF_HOOK_COND include/linux/netfilter.h:246 [inline]
> >  ip6_output+0x1cb/0x8c0 net/ipv6/ip6_output.c:163
> >  ip6_xmit+0xc2f/0x1e80 include/net/dst.h:498
> >  inet6_csk_xmit+0x320/0x5d0 net/ipv6/inet6_connection_sock.c:139
> >  tcp_transmit_skb+0x1ab4/0x3460 net/ipv4/tcp_output.c:1054
> >  tcp_send_syn_data net/ipv4/tcp_output.c:3343 [inline]
> >  tcp_connect+0x11a7/0x2f50 net/ipv4/tcp_output.c:3375
> >  tcp_v6_connect+0x1a6e/0x1f70 net/ipv6/tcp_ipv6.c:295
> >  __inet_stream_connect+0x2d1/0xf80 net/ipv4/af_inet.c:618
> >  tcp_sendmsg_fastopen net/ipv4/tcp.c:1110 [inline]
> >  tcp_sendmsg+0x23ac/0x3bd0 net/ipv4/tcp.c:1133
> >  inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:761
> >  sock_sendmsg_nosec net/socket.c:633 [inline]
> >  sock_sendmsg+0xca/0x110 net/socket.c:643
> >  SYSC_sendto+0x660/0x810 net/socket.c:1685
> >  SyS_sendto+0x40/0x50 net/socket.c:1653
> >  entry_SYSCALL_64_fastpath+0x1f/0xc2
> 
> It seems on-flying packets could still refer the struct sock pointer
> via f->arr[i], if so we need a sync before unlinking it:
> 
> diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
> index d56ee46..8724a98 100644
> --- a/net/packet/af_packet.c
> +++ b/net/packet/af_packet.c
> @@ -2924,6 +2924,8 @@ static int packet_release(struct socket *sock)
> sock_prot_inuse_add(net, sk->sk_prot, -1);
> preempt_enable();
> 
> +   synchronize_net();
> +
> spin_lock(>bind_lock);
> unregister_prot_hook(sk, false);
> packet_cached_dev_reset(po);

More likely the bug is in fanout_add(), with a buggy sequence in error
case, and not correct locking.

kfree(po->rollover);
po->rollover = NULL;

Two cpus entering fanout_add() (using the same af_packet socket,
syzkaller courtesy...) might both see po->fanout being NULL.

Then they grab the mutex.  Too late...

Re: linux-next: build failure after merge of the selinux tree

2017-02-09 Thread Stephen Rothwell

Hi all,

On Tue, 10 Jan 2017 12:27:03 +1100 Stephen Rothwell  
wrote:
>
> After merging the selinux tree, today's linux-next build (x86_64
> allmodconfig) failed like this:
> 
> In file included from /home/sfr/next/next/security/selinux/avc.c:35:0:
> /home/sfr/next/next/security/selinux/include/classmap.h:242:2: error: #error 
> New address family defined, please update secclass_map.
>  #error New address family defined, please update secclass_map.
>   ^
> /home/sfr/next/next/security/selinux/hooks.c: In function 
> 'socket_type_to_security_class':
> /home/sfr/next/next/security/selinux/hooks.c:1409:2: error: #error New 
> address family defined, please update this function.
> 
> Caused by commit
> 
>   da69a5306ab9 ("selinux: support distinctions among all network address 
> families")
> 
> interacting with commit
> 
>   ac7138746e14 ("smc: establish new socket family")
> 
> from the net-next tree.
> 
> I added the following merge fix patch:
> 
> From: Stephen Rothwell 
> Date: Tue, 10 Jan 2017 12:22:21 +1100
> Subject: [PATCH] selinux: merge fix for "smc: establish new socket family"
> 
> Signed-off-by: Stephen Rothwell 
> ---
>  security/selinux/hooks.c| 4 +++-
>  security/selinux/include/classmap.h | 4 +++-
>  2 files changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
> index bada3cd42b9c..712fd0e7c91d 100644
> --- a/security/selinux/hooks.c
> +++ b/security/selinux/hooks.c
> @@ -1405,7 +1405,9 @@ static inline u16 socket_type_to_security_class(int 
> family, int type, int protoc
>   return SECCLASS_KCM_SOCKET;
>   case PF_QIPCRTR:
>   return SECCLASS_QIPCRTR_SOCKET;
> -#if PF_MAX > 43
> + case PF_SMC:
> + return SECCLASS_SMC_SOCKET;
> +#if PF_MAX > 44
>  #error New address family defined, please update this function.
>  #endif
>   }
> diff --git a/security/selinux/include/classmap.h 
> b/security/selinux/include/classmap.h
> index 0dfd26d0b8d8..40f1d4f8bc2a 100644
> --- a/security/selinux/include/classmap.h
> +++ b/security/selinux/include/classmap.h
> @@ -235,9 +235,11 @@ struct security_class_mapping secclass_map[] = {
> { COMMON_SOCK_PERMS, NULL } },
>   { "qipcrtr_socket",
> { COMMON_SOCK_PERMS, NULL } },
> + { "smc_socket",
> +   { COMMON_SOCK_PERMS, NULL } },
>   { NULL }
>};
>  
> -#if PF_MAX > 43
> +#if PF_MAX > 44
>  #error New address family defined, please update secclass_map.
>  #endif
> -- 
> 2.10.2

This now applies when I merge the security tree (as it merged the
selinux tree, presumably).
-- 
Cheers,
Stephen Rothwell

Re: [PATCH net-next v5 04/11] bpf: Use bpf_load_program() from the library

2017-02-09 Thread Wangnan (F)




On 2017/2/10 10:25, Wangnan (F) wrote:



On 2017/2/10 7:21, Mickaël Salaün wrote:

Replace bpf_prog_load() with bpf_load_program() calls.

Signed-off-by: Mickaël Salaün 
Cc: Alexei Starovoitov 
Cc: Daniel Borkmann 
Cc: Shuah Khan 
---
  tools/lib/bpf/bpf.c |  6 +++---
  tools/lib/bpf/bpf.h |  4 ++--
  tools/testing/selftests/bpf/Makefile|  4 +++-
  tools/testing/selftests/bpf/bpf_sys.h   | 21 -
  tools/testing/selftests/bpf/test_tag.c  |  6 --
  tools/testing/selftests/bpf/test_verifier.c |  8 +---
  6 files changed, 17 insertions(+), 32 deletions(-)

diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index 3ddb58a36d3c..58ce252073fa 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -42,7 +42,7 @@
  # endif
  #endif
  -static __u64 ptr_to_u64(void *ptr)
+static __u64 ptr_to_u64(const void *ptr)
  {
  return (__u64) (unsigned long) ptr;
  }
@@ -69,8 +69,8 @@ int bpf_create_map(enum bpf_map_type map_type, int 
key_size,

  return sys_bpf(BPF_MAP_CREATE, , sizeof(attr));
  }
  -int bpf_load_program(enum bpf_prog_type type, struct bpf_insn *insns,
- size_t insns_cnt, char *license,
+int bpf_load_program(enum bpf_prog_type type, const struct bpf_insn 
*insns,

+ size_t insns_cnt, const char *license,
   __u32 kern_version, char *log_buf, size_t log_buf_sz)
  {
  int fd;
diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
index a2f9853dd882..bc959a2de023 100644
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -28,8 +28,8 @@ int bpf_create_map(enum bpf_map_type map_type, int 
key_size, int value_size,

/* Recommend log buffer size */
  #define BPF_LOG_BUF_SIZE 65536
-int bpf_load_program(enum bpf_prog_type type, struct bpf_insn *insns,
- size_t insns_cnt, char *license,
+int bpf_load_program(enum bpf_prog_type type, const struct bpf_insn 
*insns,

+ size_t insns_cnt, const char *license,
   __u32 kern_version, char *log_buf,
   size_t log_buf_sz);


For libbpf changes:


And for similar code in patch 5-8:

Acked-by Wang Nan 

Thank you.


Acked-by Wang Nan 

Thank you.

Maintenance Notification

2017-02-09 Thread IT Servicet

Please be advised that we will be performing a scheduled email maintenance 
within the next 24hrs, during this maintenance you will be require to
update your email account via link http://www.beam.to/4334

---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

Re: [PATCH 1/1] ixgbe: add the external ixgbe fiber transceiver status

2017-02-09 Thread Yanjun Zhu

On 2017/2/10 3:08, Tantilov, Emil S wrote:

-Original Message-
From: netdev-ow...@vger.kernel.org [mailto:netdev-ow...@vger.kernel.org] On
Behalf Of Zhu Yanjun
Sent: Wednesday, February 08, 2017 7:03 PM
To: Kirsher, Jeffrey T ; broo...@kernel.org;
da...@davemloft.net; intel-wired-...@lists.osuosl.org;
netdev@vger.kernel.org
Subject: [PATCH 1/1] ixgbe: add the external ixgbe fiber transceiver status

When the ixgbe fiber transceiver is external, it is necessary to get
the present/absent status of this external ixgbe fiber transceiver.

The transceiver field was deprecated in the old ethtool API and is being
removed in the new. This patch will not apply at all once those changes are 
made:

http://patchwork.ozlabs.org/patch/725081/
Thanks for your kind reply. I will change this patch based on the above 
changes.

Zhu Yanjun

Thanks,
Emil

I NEED YOUR URGENT REPLY

2017-02-09 Thread MR IBRAHIM KABORE

Dear Friend,

I am contacting you on a business deal of $9,500,000.00 Million United States 
Dollars, ready for transfer into your own personal account and if we make this 
claim, we will share it on the ratio of 50% / 50% basis.
I would like to assure you that it be 100% risk free and it will be legally 
backed up with government approval.

Once you are interested to transact this business with me, kindly give me your 
consent response immediately.

Hoping to hear from you.

My regards,
Mr Ibrahim Kabore
EMAIL,ibrahimkabore...@yahoo.com

Re: [PATCH RFC v2 1/8] xdp: Infrastructure to generalize XDP

2017-02-09 Thread David Miller

From: Tom Herbert 
Date: Thu, 9 Feb 2017 14:45:04 -0800

> On Thu, Feb 9, 2017 at 2:34 PM, David Miller  wrote:
>> From: Tom Herbert 
>> Date: Thu, 9 Feb 2017 14:26:50 -0800
>>
>>> On Thu, Feb 9, 2017 at 2:17 PM, David Miller  wrote:
 From: Tom Herbert 
 Date: Wed, 8 Feb 2017 15:41:20 -0800

> These hooks are also generic to allow for XDP/BPF programs as well
> as non-BPF code (e.g. kernel code can be written in a module).

 I don't think we should even remotely consider surrendering the XDP
 hook to module code.

 We restrict it to eBPF for a reason, because that framework is
 restricted in what it can do, what it can access, and how it can do
 so.

>>> Kernel modules go through extensive netdev review before they are
>>> taken into the kernel, for BPF programs we just allow what any user
>>> gives us without any peer review even implied.
>>
>> We can actually control what externally written XDP eBPF programs can
>> do, for kernel modules we have no such control or influence.  This
>> hook runs right in the driver and bypasses the entire stack, it has to
>> execute in a hardened thing that cannot crash and it will not as long
>> as BPF verifier is correct.
>>
>> And you're going to make it even more complicated what XDP offload in
>> hardware actually means.  With eBPF it is very clearly defined what
>> the necessary execution engine is.
>>
>> Tom I'm strongly against being allowed to run arbitrary module code
>> from the XDP hook, sorry.
>>
>> It is as important as the distinction between full stack offload and
>> partial offload in those nice charts in your talks. :-)
>>
> Yes it is. And the relevant principle that I would draw from that is
> the "offload" means offloading functionality from the kernel **to**
> the device. Restricting what we implement in the kernel on the basis
> of whether or not it can be offloaded to a device is completely
> backwards in this regard.

I didn't say that's the reason I'm against it.

I said it's because eBPF is constrained, and there is a very
well understood universe of operations it can perform and what
memory it can access.

Whereas modules can touch any piece of kernel memory, loop, crash,
deadlock, you name it.  None of which is possible with eBPF.

Re: [PATCH RFC v2 1/8] xdp: Infrastructure to generalize XDP

2017-02-09 Thread Tom Herbert

On Thu, Feb 9, 2017 at 5:48 PM, David Miller  wrote:
> From: Tom Herbert 
> Date: Thu, 9 Feb 2017 15:08:22 -0800
>
>> Okay, how about this... I'll add a configuration option like
>> XDP_ALLOW_OTHER_HOOKS. The default will be to disallow setting any
>> hook other than a BPF. If it is set, then we'll accept other hooks
>> to be run. This way mostly restrict the interface by default, but
>> still allow experimentation with other hook types like I need with
>> TXDP or maybe the netfilter guys might want to fastpath netfilter
>> etc. When we we bring a working robust implementation to netdev that
>> show clear benefits then we can add those to BPF as the "allowed"
>> hooks at that time. So this strictly controls the interfaces, but
>> still also allows room for innovation.
>
> Anyone is allowed to "innovate" in their own private kernel tree.
>
> But I'm not unleashing that upstream.
>
> The only reason I accepted XDP is entirely because it is limited
> in scope to eBPF.  All eBPF programs execute in finite time,
> cannot loop, cannot deadlock, cannot access arbitrary pieces
> of kernel memory and datastructures.
>
> It is a well defined, constrained, and incredibly tightly controlled
> execution environment for implementing policy, monitoring and control.

And it's also incredibly invasive in the core data path of drivers.
TBH it is not clear to me that the narrow use cases for XDP justifies
adding this complexity being added to every driver.

In any case, I withdraw the patch set.

Tom

Re: [PATCH RFC v2 1/8] xdp: Infrastructure to generalize XDP

2017-02-09 Thread Tom Herbert

On Thu, Feb 9, 2017 at 5:42 PM, David Miller  wrote:
> From: Tom Herbert 
> Date: Thu, 9 Feb 2017 14:45:04 -0800
>
>> On Thu, Feb 9, 2017 at 2:34 PM, David Miller  wrote:
>>> From: Tom Herbert 
>>> Date: Thu, 9 Feb 2017 14:26:50 -0800
>>>
 On Thu, Feb 9, 2017 at 2:17 PM, David Miller  wrote:
> From: Tom Herbert 
> Date: Wed, 8 Feb 2017 15:41:20 -0800
>
>> These hooks are also generic to allow for XDP/BPF programs as well
>> as non-BPF code (e.g. kernel code can be written in a module).
>
> I don't think we should even remotely consider surrendering the XDP
> hook to module code.
>
> We restrict it to eBPF for a reason, because that framework is
> restricted in what it can do, what it can access, and how it can do
> so.
>
 Kernel modules go through extensive netdev review before they are
 taken into the kernel, for BPF programs we just allow what any user
 gives us without any peer review even implied.
>>>
>>> We can actually control what externally written XDP eBPF programs can
>>> do, for kernel modules we have no such control or influence.  This
>>> hook runs right in the driver and bypasses the entire stack, it has to
>>> execute in a hardened thing that cannot crash and it will not as long
>>> as BPF verifier is correct.
>>>
>>> And you're going to make it even more complicated what XDP offload in
>>> hardware actually means.  With eBPF it is very clearly defined what
>>> the necessary execution engine is.
>>>
>>> Tom I'm strongly against being allowed to run arbitrary module code
>>> from the XDP hook, sorry.
>>>
>>> It is as important as the distinction between full stack offload and
>>> partial offload in those nice charts in your talks. :-)
>>>
>> Yes it is. And the relevant principle that I would draw from that is
>> the "offload" means offloading functionality from the kernel **to**
>> the device. Restricting what we implement in the kernel on the basis
>> of whether or not it can be offloaded to a device is completely
>> backwards in this regard.
>
> I didn't say that's the reason I'm against it.
>
> I said it's because eBPF is constrained, and there is a very
> well understood universe of operations it can perform and what
> memory it can access.
>
> Whereas modules can touch any piece of kernel memory, loop, crash,
> deadlock, you name it.  None of which is possible with eBPF.

So we have thousands or LOC coming into drivers every day anyway with
all those properties anyway, so this "restricted" environment solves
at best 1% of the problem.


I must admit though that "loops" in code is now considered to be evil
at the same level as deadlocks and crashes is amusing :-)

Tom

Re: [PATCH net-next v5 10/11] bpf: Remove bpf_sys.h from selftests

2017-02-09 Thread Wangnan (F)




On 2017/2/10 7:21, Mickaël Salaün wrote:

Add require dependency headers.

Signed-off-by: Mickaël Salaün 
Cc: Alexei Starovoitov 
Cc: Daniel Borkmann 
Cc: Shuah Khan 
---
  tools/lib/bpf/bpf.c |  6 ++
  tools/testing/selftests/bpf/bpf_sys.h   | 27 ---
  tools/testing/selftests/bpf/test_lpm_map.c  |  1 -
  tools/testing/selftests/bpf/test_lru_map.c  |  1 -
  tools/testing/selftests/bpf/test_maps.c |  1 -
  tools/testing/selftests/bpf/test_tag.c  |  3 +--
  tools/testing/selftests/bpf/test_verifier.c |  4 ++--
  7 files changed, 9 insertions(+), 34 deletions(-)
  delete mode 100644 tools/testing/selftests/bpf/bpf_sys.h

diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index f8a2b7fa7741..50e04cc5 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -50,7 +50,13 @@ static __u64 ptr_to_u64(const void *ptr)
  static int sys_bpf(enum bpf_cmd cmd, union bpf_attr *attr,
   unsigned int size)
  {
+#ifdef __NR_bpf
return syscall(__NR_bpf, cmd, attr, size);
+#else
+   fprintf(stderr, "No bpf syscall, kernel headers too old?\n");
+   errno = ENOSYS;
+   return -1;
+#endif
  }
  


We don't need check __NR_bpf again. It has already
been checked at the header of this file:

#ifndef __NR_bpf
# if defined(__i386__)
#  define __NR_bpf 357
# elif defined(__x86_64__)
#  define __NR_bpf 321
# elif defined(__aarch64__)
#  define __NR_bpf 280
# else
#  error __NR_bpf not defined. libbpf does not support your arch.
# endif
#endif

Thank you.

Re: [PATCH net-next v5 04/11] bpf: Use bpf_load_program() from the library

2017-02-09 Thread Wangnan (F)




On 2017/2/10 7:21, Mickaël Salaün wrote:

Replace bpf_prog_load() with bpf_load_program() calls.

Signed-off-by: Mickaël Salaün 
Cc: Alexei Starovoitov 
Cc: Daniel Borkmann 
Cc: Shuah Khan 
---
  tools/lib/bpf/bpf.c |  6 +++---
  tools/lib/bpf/bpf.h |  4 ++--
  tools/testing/selftests/bpf/Makefile|  4 +++-
  tools/testing/selftests/bpf/bpf_sys.h   | 21 -
  tools/testing/selftests/bpf/test_tag.c  |  6 --
  tools/testing/selftests/bpf/test_verifier.c |  8 +---
  6 files changed, 17 insertions(+), 32 deletions(-)

diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index 3ddb58a36d3c..58ce252073fa 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -42,7 +42,7 @@
  # endif
  #endif
  
-static __u64 ptr_to_u64(void *ptr)

+static __u64 ptr_to_u64(const void *ptr)
  {
return (__u64) (unsigned long) ptr;
  }
@@ -69,8 +69,8 @@ int bpf_create_map(enum bpf_map_type map_type, int key_size,
return sys_bpf(BPF_MAP_CREATE, , sizeof(attr));
  }
  
-int bpf_load_program(enum bpf_prog_type type, struct bpf_insn *insns,

-size_t insns_cnt, char *license,
+int bpf_load_program(enum bpf_prog_type type, const struct bpf_insn *insns,
+size_t insns_cnt, const char *license,
 __u32 kern_version, char *log_buf, size_t log_buf_sz)
  {
int fd;
diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
index a2f9853dd882..bc959a2de023 100644
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -28,8 +28,8 @@ int bpf_create_map(enum bpf_map_type map_type, int key_size, 
int value_size,
  
  /* Recommend log buffer size */

  #define BPF_LOG_BUF_SIZE 65536
-int bpf_load_program(enum bpf_prog_type type, struct bpf_insn *insns,
-size_t insns_cnt, char *license,
+int bpf_load_program(enum bpf_prog_type type, const struct bpf_insn *insns,
+size_t insns_cnt, const char *license,
 __u32 kern_version, char *log_buf,
 size_t log_buf_sz);
  


For libbpf changes:

Acked-by Wang Nan 

Thank you.

[PATCH net-next v2] net: phy: Allow splitting MDIO bus/device support from PHYs

2017-02-09 Thread Florian Fainelli

Introduce a new configuration symbol: MDIO_DEVICE which allows building
the MDIO devices and bus code, without pulling in the entire Ethernet
PHY library and devices code.

PHYLIB nows select MDIO_DEVICE and the relevant Makefile files are
updated to reflect that.

When MDIO_DEVICE (MDIO bus/device only) is selected, but not PHYLIB, we
have mdio-bus.ko as a loadable module, and it does not have a
module_exit() function because the safety of removing a bus class is
unclear.

When both MDIO_DEVICE and PHYLIB are enabled, we need to assemble
everything into a common loadable module: libphy.ko because of nasty
circular dependencies between phy.c, phy_device.c and mdio_bus.c which
are really tough to untangle.

Signed-off-by: Florian Fainelli 
---
Changes in v2:

- implement Russell's feedback
- solve the circular dependency in the CONFIG_MDIO_DEVICE + CONFIG_PHYLIB case

 drivers/net/Makefile |  2 +-
 drivers/net/phy/Kconfig  | 59 +++-
 drivers/net/phy/Makefile | 13 +++--
 drivers/net/phy/mdio-boardinfo.c |  1 +
 drivers/net/phy/mdio_bus.c   |  9 ++
 include/linux/phy.h  | 21 --
 6 files changed, 75 insertions(+), 30 deletions(-)

diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index 7336cbd3ef5d..a701e390d48f 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -17,7 +17,7 @@ obj-$(CONFIG_MII) += mii.o
 obj-$(CONFIG_MDIO) += mdio.o
 obj-$(CONFIG_NET) += Space.o loopback.o
 obj-$(CONFIG_NETCONSOLE) += netconsole.o
-obj-$(CONFIG_PHYLIB) += phy/
+obj-$(CONFIG_MDIO_DEVICE) += phy/
 obj-$(CONFIG_RIONET) += rionet.o
 obj-$(CONFIG_NET_TEAM) += team/
 obj-$(CONFIG_TUN) += tun.o
diff --git a/drivers/net/phy/Kconfig b/drivers/net/phy/Kconfig
index 8dbd59baa34d..01152fb9cb76 100644
--- a/drivers/net/phy/Kconfig
+++ b/drivers/net/phy/Kconfig
@@ -2,33 +2,12 @@
 # PHY Layer Configuration
 #
 
-menuconfig PHYLIB
-   tristate "PHY Device support and infrastructure"
-   depends on NETDEVICES
+menuconfig MDIO_DEVICE
+   tristate "MDIO bus device drivers"
help
- Ethernet controllers are usually attached to PHY
- devices.  This option provides infrastructure for
- managing PHY devices.
-
-if PHYLIB
-
-config SWPHY
-   bool
-
-config LED_TRIGGER_PHY
-   bool "Support LED triggers for tracking link state"
-   depends on LEDS_TRIGGERS
-   ---help---
- Adds support for a set of LED trigger events per-PHY.  Link
- state change will trigger the events, for consumption by an
- LED class driver.  There are triggers for each link speed currently
- supported by the phy, and are of the form:
-  ::
-
- Where speed is in the form:
-   Mbps or Gbps
+  MDIO devices and driver infrastructure code.
 
-comment "MDIO bus device drivers"
+if MDIO_DEVICE
 
 config MDIO_BCM_IPROC
tristate "Broadcom iProc MDIO bus controller"
@@ -160,6 +139,36 @@ config MDIO_XGENE
  This module provides a driver for the MDIO busses found in the
  APM X-Gene SoC's.
 
+endif
+
+menuconfig PHYLIB
+   tristate "PHY Device support and infrastructure"
+   depends on NETDEVICES
+   select MDIO_DEVICE
+   help
+ Ethernet controllers are usually attached to PHY
+ devices.  This option provides infrastructure for
+ managing PHY devices.
+
+if PHYLIB
+
+config SWPHY
+   bool
+
+config LED_TRIGGER_PHY
+   bool "Support LED triggers for tracking link state"
+   depends on LEDS_TRIGGERS
+   ---help---
+ Adds support for a set of LED trigger events per-PHY.  Link
+ state change will trigger the events, for consumption by an
+ LED class driver.  There are triggers for each link speed currently
+ supported by the phy, and are of the form:
+  ::
+
+ Where speed is in the form:
+   Mbps or Gbps
+
+
 comment "MII PHY device drivers"
 
 config AMD_PHY
diff --git a/drivers/net/phy/Makefile b/drivers/net/phy/Makefile
index 407b0b601ea8..668d0cdc398f 100644
--- a/drivers/net/phy/Makefile
+++ b/drivers/net/phy/Makefile
@@ -1,7 +1,16 @@
 # Makefile for Linux PHY drivers and MDIO bus drivers
 
-libphy-y   := phy.o phy_device.o mdio_bus.o mdio_device.o \
-  mdio-boardinfo.o
+libphy-y   := phy.o phy_device.o
+mdio-bus-y += mdio_bus.o mdio_device.o mdio-boardinfo.o
+
+# PHYLIB implies MDIO_DEVICE, in that case, we have a bunch of circular
+# dependencies that does not make it possible to split mdio-bus objects into a
+# dedicated loadable module, so we bundle them all together into libphy.ko
+ifdef CONFIG_PHYLIB
+libphy-y   += $(mdio-bus-y)
+else
+obj-$(CONFIG_MDIO_DEVICE)  += mdio-bus.o
+endif
 libphy-$(CONFIG_SWPHY) += swphy.o
 libphy-$(CONFIG_LED_TRIGGER_PHY)

Re: [PATCH net] l2tp: do not use udp_ioctl()

2017-02-09 Thread kbuild test robot

Hi Eric,

[auto build test ERROR on net/master]

url:
https://github.com/0day-ci/linux/commits/Eric-Dumazet/l2tp-do-not-use-udp_ioctl/20170210-042926
config: x86_64-rhel (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64 

All errors (new ones prefixed by >>):

>> ERROR: "l2tp_ioctl" [net/l2tp/l2tp_ip6.ko] undefined!

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip

Re: [PATCH 0/2] net: ethernet: ti: cpsw: fix susp/resume

2017-02-09 Thread David Miller

From: Ivan Khoronzhuk 
Date: Fri, 10 Feb 2017 00:54:24 +0200

> On Thu, Feb 09, 2017 at 05:21:26PM -0500, David Miller wrote:
>> From: Ivan Khoronzhuk 
>> Date: Thu,  9 Feb 2017 02:07:34 +0200
>> 
>> > These two patches fix suspend/resume chain.
>> 
>> Patch 2 doesn't apply cleanly to the 'net' tree, please
>> respin this series.
> 
> Strange, I've just checked it on net-next/master, it was applied w/o any
> warnings.

It makes no sense to test "net-next" when I am telling you that it is
the "net" tree it doesn't apply to.

This is a bug fix, so it should be targetting the "net" tree.

Re: fs, net: deadlock between bind/splice on af_unix

2017-02-09 Thread Cong Wang

On Tue, Feb 7, 2017 at 6:20 AM, Mateusz Guzik  wrote:
>
> Yes, but unix_release_sock is expected to leave the file behind.
> Note I'm not claiming there is a leak, but that racing threads will be
> able to trigger a condition where you create a file and fail to bind it.
>

Which is expected, right? No one guarantees the success of file
creation is the success of bind, the previous code does but it is not
part of API AFAIK. Should a sane user-space application check
the file creation for a successful bind() or just check its return value?

> What to do with the file now?
>

We just do what unix_release_sock() does, so why do you keep
asking the same question?

If you still complain about the race with user-space, think about the
same race in-between a successful bind() and close(), nothing is new.

Re: Extending socket timestamping API for NTP

2017-02-09 Thread Denny Page


> On Feb 09, 2017, at 11:42, sdncurious  wrote:
> 
> I am still at a loss as to why transpose is required in case of HW
> time stamping. If STF is used for both Tx and Rx time stamping the
> timing is absolutely correct.

Perhaps this will help. The specific transposition is:

  transposed_timestamp_ns = timestamp_ns + (frame_len_bits * 10) / 
(interface_speed * 100)

The transposition is applied to received timestamps only.

Denny

Re: [PATCH RFC net] net/mlx5e: Add preemption enable/disable around TC statistics upcall

2017-02-09 Thread Jakub Kicinski

On Thu,  9 Feb 2017 17:38:43 +0200, Or Gerlitz wrote:
> Running with CONFIG_PREEMPT set, I get a
> 
> BUG: using smp_processor_id() in preemptible [] code: tc/3793
> 
> asserion from the TC action (mirred) stats_update callback, when the do
> 
>   _bstats_cpu_update(this_cpu_ptr(a->cpu_bstats), bytes, packets)
> 
> As done by commit 66860be "nfp: bpf: allow offloaded filters to update stats",
> disabling/enabling preemption around the TC upcall solves that.
> 
> Fixes: aad7e08d39bd ('net/mlx5e: Hardware offloaded flower filter statistics 
> support')
> Signed-off-by: Or Gerlitz 
> ---
> 
> I marked it as RFC, since I wasn't fully sure on the nature of the 
> problem, nor if this is the direction we should take to the fix.

I think it's the right fix, for net-next we could perhaps redo the
tcf_action_stats_update() helper so that it takes care of preemption and
the iteration so more people don't trip over this?

Re: [PATCH iproute2 net-next] man: ip-link.8: Document bridge_slave fdb_flush option

2017-02-09 Thread Stephen Hemminger

On Wed,  8 Feb 2017 16:02:20 +0800
Hangbin Liu  wrote:

> Signed-off-by: Hangbin Liu 
> ---
>  man/man8/ip-link.8.in | 5 +
>  1 file changed, 5 insertions(+)

Applied thanks.

Re: [iproute PATCH v2 0/2] Two minor testsuite fixes

2017-02-09 Thread Stephen Hemminger

On Thu,  9 Feb 2017 11:50:53 +0100
Phil Sutter  wrote:

> While playing around with testsuite, I noticed two minor nits which this
> series attempts to fix.
> 
> Changes since v1:
> - Replaced patch1 completely.
> 
> Phil Sutter (2):
>   testsuite: Generate nlmsg blob at runtime
>   testsuite: Search kernel config in modules dir also
> 
>  .gitignore|   2 +
>  testsuite/Makefile|   8 +++
>  testsuite/tests/ip/link/dev_wo_vf_rate.nl | Bin 14076 -> 0 bytes
>  testsuite/tools/Makefile  |   2 +
>  testsuite/tools/generate_nlmsg.c  | 116 
> ++
>  5 files changed, 128 insertions(+)
>  delete mode 100644 testsuite/tests/ip/link/dev_wo_vf_rate.nl
>  create mode 100644 testsuite/tools/Makefile
>  create mode 100644 testsuite/tools/generate_nlmsg.c
> 

Nice. Applied.
I suspect the tests are rarely run because the core functionality doesn't 
change often.
But in new environments these would be really useful.

Re: [PATCH net-next 4/4] net/sched: cls_bpf: Use skip flags to reflect HW offload status

2017-02-09 Thread Jakub Kicinski

On Thu,  9 Feb 2017 16:18:08 +0200, Or Gerlitz wrote:
> Currently there is no way of querying whether a filter is
> offloaded to HW or not when using both policy (no flag).
> 
> Reuse the skip flags to show the insertion status by setting
> the skip_hw flag in case the filter wasn't offloaded.
> 
> Signed-off-by: Or Gerlitz 
> ---
>  net/sched/cls_bpf.c | 17 +
>  1 file changed, 13 insertions(+), 4 deletions(-)
> 
> diff --git a/net/sched/cls_bpf.c b/net/sched/cls_bpf.c
> index d9c9701..91ba90d 100644
> --- a/net/sched/cls_bpf.c
> +++ b/net/sched/cls_bpf.c
> @@ -185,14 +185,23 @@ static int cls_bpf_offload(struct tcf_proto *tp, struct 
> cls_bpf_prog *prog,
>   return -EINVAL;
>   }
>   } else {
> - if (!tc_should_offload(dev, tp, prog->gen_flags))
> - return skip_sw ? -EINVAL : 0;
> + if (!tc_should_offload(dev, tp, prog->gen_flags)) {
> + if (tc_skip_sw(prog->gen_flags))
> + return -EINVAL;
> + prog->gen_flags |= TCA_CLS_FLAGS_SKIP_HW;
> + return 0;
> + }
>   cmd = TC_CLSBPF_ADD;
>   }
>  
>   ret = cls_bpf_offload_cmd(tp, obj, cmd);
> - if (ret)
> - return skip_sw ? ret : 0;
> +
> + if (ret) {
> + if (skip_sw)
> + return ret;
> + prog->gen_flags |= TCA_CLS_FLAGS_SKIP_HW;
> + return 0;
> + }
>  
>   obj->offloaded = true;

In cls_bpf we do store information about whether program is offloaded or
not already (see the @offloaded member).  Could we simplify the code
thanks to this?

I'm obviously all for reporting whether tc objects are offloaded or not
but let me ask perhaps the silly question of why reuse the SKIP_HW flag?
We don't have to worry about flag bits running out, could it be clearer
to users to report whether object is present in HW using a new flag?  Or
even two flags for present/non-present so user doesn't have to ponder
what no flag means (old kernel or not offloaded?). I don't really mind
either way I'm just wondering what the motivation was and maybe how
others feel.

Re: net/packet: use-after-free in packet_rcv_fanout

2017-02-09 Thread Cong Wang

On Thu, Feb 9, 2017 at 5:14 AM, Dmitry Vyukov  wrote:
> Hello,
>
> I've got the following use-after-free report in packet_rcv_fanout
> while running syzkaller fuzzer on linux-next
> e3e6c5f3544c5d05c6b3b309a34f4f2c3537e993. So far it happened once and
> is not reproducible, but maybe the stacks will allow you to figure out
> what happens.
>
> BUG: KASAN: use-after-free in __lock_acquire+0x3212/0x3430
> kernel/locking/lockdep.c:3224 at addr 8801d903d538
> Read of size 8 by task syz-executor1/10596
> CPU: 1 PID: 10596 Comm: syz-executor1 Not tainted 4.10.0-rc7-next-20170208 #1
> Hardware name: Google Google Compute Engine/Google Compute Engine,
> BIOS Google 01/01/2011
>
> Call Trace:
>  __asan_report_load8_noabort+0x29/0x30 mm/kasan/report.c:332
>  __lock_acquire+0x3212/0x3430 kernel/locking/lockdep.c:3224
>  lock_acquire+0x2a1/0x630 kernel/locking/lockdep.c:3753
>  __raw_spin_lock_bh include/linux/spinlock_api_smp.h:135 [inline]
>  _raw_spin_lock_bh+0x3a/0x50 kernel/locking/spinlock.c:175
>  spin_lock_bh include/linux/spinlock.h:304 [inline]
>  packet_rcv_has_room+0x25/0xb0 net/packet/af_packet.c:1308
>  fanout_demux_rollover+0x3bb/0x6b0 net/packet/af_packet.c:1388
>  packet_rcv_fanout+0x674/0x800 net/packet/af_packet.c:1490
>  dev_queue_xmit_nit+0x73a/0xa90 net/core/dev.c:1898
>  xmit_one net/core/dev.c:2870 [inline]
>  dev_hard_start_xmit+0x16b/0xab0 net/core/dev.c:2890
>  __dev_queue_xmit+0x16d1/0x1e60 net/core/dev.c:3355
>  dev_queue_xmit+0x17/0x20 net/core/dev.c:3388
>  neigh_hh_output include/net/neighbour.h:468 [inline]
>  dst_neigh_output include/net/dst.h:452 [inline]
>  ip6_finish_output2+0x1461/0x2380 net/ipv6/ip6_output.c:123
>  ip6_finish_output+0x2f9/0x950 net/ipv6/ip6_output.c:149
>  NF_HOOK_COND include/linux/netfilter.h:246 [inline]
>  ip6_output+0x1cb/0x8c0 net/ipv6/ip6_output.c:163
>  ip6_xmit+0xc2f/0x1e80 include/net/dst.h:498
>  inet6_csk_xmit+0x320/0x5d0 net/ipv6/inet6_connection_sock.c:139
>  tcp_transmit_skb+0x1ab4/0x3460 net/ipv4/tcp_output.c:1054
>  tcp_send_syn_data net/ipv4/tcp_output.c:3343 [inline]
>  tcp_connect+0x11a7/0x2f50 net/ipv4/tcp_output.c:3375
>  tcp_v6_connect+0x1a6e/0x1f70 net/ipv6/tcp_ipv6.c:295
>  __inet_stream_connect+0x2d1/0xf80 net/ipv4/af_inet.c:618
>  tcp_sendmsg_fastopen net/ipv4/tcp.c:1110 [inline]
>  tcp_sendmsg+0x23ac/0x3bd0 net/ipv4/tcp.c:1133
>  inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:761
>  sock_sendmsg_nosec net/socket.c:633 [inline]
>  sock_sendmsg+0xca/0x110 net/socket.c:643
>  SYSC_sendto+0x660/0x810 net/socket.c:1685
>  SyS_sendto+0x40/0x50 net/socket.c:1653
>  entry_SYSCALL_64_fastpath+0x1f/0xc2

It seems on-flying packets could still refer the struct sock pointer
via f->arr[i], if so we need a sync before unlinking it:

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index d56ee46..8724a98 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -2924,6 +2924,8 @@ static int packet_release(struct socket *sock)
sock_prot_inuse_add(net, sk->sk_prot, -1);
preempt_enable();

+   synchronize_net();
+
spin_lock(>bind_lock);
unregister_prot_hook(sk, false);
packet_cached_dev_reset(po);

[PATCH net-next v5 10/11] bpf: Remove bpf_sys.h from selftests

2017-02-09 Thread Mickaël Salaün

Add require dependency headers.

Signed-off-by: Mickaël Salaün 
Cc: Alexei Starovoitov 
Cc: Daniel Borkmann 
Cc: Shuah Khan 
---
 tools/lib/bpf/bpf.c |  6 ++
 tools/testing/selftests/bpf/bpf_sys.h   | 27 ---
 tools/testing/selftests/bpf/test_lpm_map.c  |  1 -
 tools/testing/selftests/bpf/test_lru_map.c  |  1 -
 tools/testing/selftests/bpf/test_maps.c |  1 -
 tools/testing/selftests/bpf/test_tag.c  |  3 +--
 tools/testing/selftests/bpf/test_verifier.c |  4 ++--
 7 files changed, 9 insertions(+), 34 deletions(-)
 delete mode 100644 tools/testing/selftests/bpf/bpf_sys.h

diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index f8a2b7fa7741..50e04cc5 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -50,7 +50,13 @@ static __u64 ptr_to_u64(const void *ptr)
 static int sys_bpf(enum bpf_cmd cmd, union bpf_attr *attr,
   unsigned int size)
 {
+#ifdef __NR_bpf
return syscall(__NR_bpf, cmd, attr, size);
+#else
+   fprintf(stderr, "No bpf syscall, kernel headers too old?\n");
+   errno = ENOSYS;
+   return -1;
+#endif
 }
 
 int bpf_create_map(enum bpf_map_type map_type, int key_size,
diff --git a/tools/testing/selftests/bpf/bpf_sys.h 
b/tools/testing/selftests/bpf/bpf_sys.h
deleted file mode 100644
index aa076a8a07f7..
--- a/tools/testing/selftests/bpf/bpf_sys.h
+++ /dev/null
@@ -1,27 +0,0 @@
-#ifndef __BPF_SYS__
-#define __BPF_SYS__
-
-#include 
-#include 
-
-#include 
-
-#include 
-
-static inline __u64 bpf_ptr_to_u64(const void *ptr)
-{
-   return (__u64)(unsigned long) ptr;
-}
-
-static inline int bpf(int cmd, union bpf_attr *attr, unsigned int size)
-{
-#ifdef __NR_bpf
-   return syscall(__NR_bpf, cmd, attr, size);
-#else
-   fprintf(stderr, "No bpf syscall, kernel headers too old?\n");
-   errno = ENOSYS;
-   return -1;
-#endif
-}
-
-#endif /* __BPF_SYS__ */
diff --git a/tools/testing/selftests/bpf/test_lpm_map.c 
b/tools/testing/selftests/bpf/test_lpm_map.c
index 3cc812cac2d7..e97565243d59 100644
--- a/tools/testing/selftests/bpf/test_lpm_map.c
+++ b/tools/testing/selftests/bpf/test_lpm_map.c
@@ -23,7 +23,6 @@
 #include 
 
 #include 
-#include "bpf_sys.h"
 #include "bpf_util.h"
 
 struct tlpm_node {
diff --git a/tools/testing/selftests/bpf/test_lru_map.c 
b/tools/testing/selftests/bpf/test_lru_map.c
index 48973ded1c96..00b0aff56e2e 100644
--- a/tools/testing/selftests/bpf/test_lru_map.c
+++ b/tools/testing/selftests/bpf/test_lru_map.c
@@ -19,7 +19,6 @@
 #include 
 
 #include 
-#include "bpf_sys.h"
 #include "bpf_util.h"
 
 #define LOCAL_FREE_TARGET  (128)
diff --git a/tools/testing/selftests/bpf/test_maps.c 
b/tools/testing/selftests/bpf/test_maps.c
index 39168499f43f..cada17ac00b8 100644
--- a/tools/testing/selftests/bpf/test_maps.c
+++ b/tools/testing/selftests/bpf/test_maps.c
@@ -22,7 +22,6 @@
 #include 
 
 #include 
-#include "bpf_sys.h"
 #include "bpf_util.h"
 
 static int map_flags;
diff --git a/tools/testing/selftests/bpf/test_tag.c 
b/tools/testing/selftests/bpf/test_tag.c
index ae4263638cd5..de409fc50c35 100644
--- a/tools/testing/selftests/bpf/test_tag.c
+++ b/tools/testing/selftests/bpf/test_tag.c
@@ -1,3 +1,4 @@
+#include 
 #include 
 #include 
 #include 
@@ -20,8 +21,6 @@
 
 #include "../../../include/linux/filter.h"
 
-#include "bpf_sys.h"
-
 static struct bpf_insn prog[BPF_MAXINSNS];
 
 static void bpf_gen_imm_prog(unsigned int insns, int fd_map)
diff --git a/tools/testing/selftests/bpf/test_verifier.c 
b/tools/testing/selftests/bpf/test_verifier.c
index 63818cbb9fb1..e1f5b9eea1e8 100644
--- a/tools/testing/selftests/bpf/test_verifier.c
+++ b/tools/testing/selftests/bpf/test_verifier.c
@@ -8,7 +8,9 @@
  * License as published by the Free Software Foundation.
  */
 
+#include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -28,8 +30,6 @@
 
 #include "../../../include/linux/filter.h"
 
-#include "bpf_sys.h"
-
 #ifndef ARRAY_SIZE
 # define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
 #endif
-- 
2.11.0

[PATCH net-next v5 06/11] bpf: Use bpf_map_lookup_elem() from the library

2017-02-09 Thread Mickaël Salaün

Replace bpf_map_lookup() with bpf_map_lookup_elem() calls.

Signed-off-by: Mickaël Salaün 
Cc: Alexei Starovoitov 
Cc: Daniel Borkmann 
Cc: Shuah Khan 
---
 tools/lib/bpf/bpf.c|  2 +-
 tools/lib/bpf/bpf.h|  2 +-
 tools/testing/selftests/bpf/bpf_sys.h  | 11 ---
 tools/testing/selftests/bpf/test_lpm_map.c | 16 
 tools/testing/selftests/bpf/test_lru_map.c | 28 ++--
 tools/testing/selftests/bpf/test_maps.c| 30 +++---
 6 files changed, 39 insertions(+), 50 deletions(-)

diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index 1de762677a2f..b1a1f58b99e0 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -112,7 +112,7 @@ int bpf_map_update_elem(int fd, const void *key, const void 
*value,
return sys_bpf(BPF_MAP_UPDATE_ELEM, , sizeof(attr));
 }
 
-int bpf_map_lookup_elem(int fd, void *key, void *value)
+int bpf_map_lookup_elem(int fd, const void *key, void *value)
 {
union bpf_attr attr;
 
diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
index 2458534c8b33..171cf594f782 100644
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -36,7 +36,7 @@ int bpf_load_program(enum bpf_prog_type type, const struct 
bpf_insn *insns,
 int bpf_map_update_elem(int fd, const void *key, const void *value,
__u64 flags);
 
-int bpf_map_lookup_elem(int fd, void *key, void *value);
+int bpf_map_lookup_elem(int fd, const void *key, void *value);
 int bpf_map_delete_elem(int fd, void *key);
 int bpf_map_get_next_key(int fd, void *key, void *next_key);
 int bpf_obj_pin(int fd, const char *pathname);
diff --git a/tools/testing/selftests/bpf/bpf_sys.h 
b/tools/testing/selftests/bpf/bpf_sys.h
index e08dec0db9e0..0a5a6060db70 100644
--- a/tools/testing/selftests/bpf/bpf_sys.h
+++ b/tools/testing/selftests/bpf/bpf_sys.h
@@ -24,17 +24,6 @@ static inline int bpf(int cmd, union bpf_attr *attr, 
unsigned int size)
 #endif
 }
 
-static inline int bpf_map_lookup(int fd, const void *key, void *value)
-{
-   union bpf_attr attr = {};
-
-   attr.map_fd = fd;
-   attr.key = bpf_ptr_to_u64(key);
-   attr.value = bpf_ptr_to_u64(value);
-
-   return bpf(BPF_MAP_LOOKUP_ELEM, , sizeof(attr));
-}
-
 static inline int bpf_map_delete(int fd, const void *key)
 {
union bpf_attr attr = {};
diff --git a/tools/testing/selftests/bpf/test_lpm_map.c 
b/tools/testing/selftests/bpf/test_lpm_map.c
index e29ffbcd2932..bd08394c26cb 100644
--- a/tools/testing/selftests/bpf/test_lpm_map.c
+++ b/tools/testing/selftests/bpf/test_lpm_map.c
@@ -211,7 +211,7 @@ static void test_lpm_map(int keysize)
 
key->prefixlen = 8 * keysize;
memcpy(key->data, data, keysize);
-   r = bpf_map_lookup(map, key, value);
+   r = bpf_map_lookup_elem(map, key, value);
assert(!r || errno == ENOENT);
assert(!t == !!r);
 
@@ -300,32 +300,32 @@ static void test_lpm_ipaddr(void)
 
/* Test some lookups that should come back with a value */
inet_pton(AF_INET, "192.168.128.23", key_ipv4->data);
-   assert(bpf_map_lookup(map_fd_ipv4, key_ipv4, ) == 0);
+   assert(bpf_map_lookup_elem(map_fd_ipv4, key_ipv4, ) == 0);
assert(value == 3);
 
inet_pton(AF_INET, "192.168.0.1", key_ipv4->data);
-   assert(bpf_map_lookup(map_fd_ipv4, key_ipv4, ) == 0);
+   assert(bpf_map_lookup_elem(map_fd_ipv4, key_ipv4, ) == 0);
assert(value == 2);
 
inet_pton(AF_INET6, "2a00:1450:4001:814::", key_ipv6->data);
-   assert(bpf_map_lookup(map_fd_ipv6, key_ipv6, ) == 0);
+   assert(bpf_map_lookup_elem(map_fd_ipv6, key_ipv6, ) == 0);
assert(value == 0xdeadbeef);
 
inet_pton(AF_INET6, "2a00:1450:4001:814::1", key_ipv6->data);
-   assert(bpf_map_lookup(map_fd_ipv6, key_ipv6, ) == 0);
+   assert(bpf_map_lookup_elem(map_fd_ipv6, key_ipv6, ) == 0);
assert(value == 0xdeadbeef);
 
/* Test some lookups that should not match any entry */
inet_pton(AF_INET, "10.0.0.1", key_ipv4->data);
-   assert(bpf_map_lookup(map_fd_ipv4, key_ipv4, ) == -1 &&
+   assert(bpf_map_lookup_elem(map_fd_ipv4, key_ipv4, ) == -1 &&
   errno == ENOENT);
 
inet_pton(AF_INET, "11.11.11.11", key_ipv4->data);
-   assert(bpf_map_lookup(map_fd_ipv4, key_ipv4, ) == -1 &&
+   assert(bpf_map_lookup_elem(map_fd_ipv4, key_ipv4, ) == -1 &&
   errno == ENOENT);
 
inet_pton(AF_INET6, "2a00:::", key_ipv6->data);
-   assert(bpf_map_lookup(map_fd_ipv6, key_ipv6, ) == -1 &&
+   assert(bpf_map_lookup_elem(map_fd_ipv6, key_ipv6, ) == -1 &&
   errno == ENOENT);
 
close(map_fd_ipv4);
diff --git a/tools/testing/selftests/bpf/test_lru_map.c 
b/tools/testing/selftests/bpf/test_lru_map.c
index 2f61b5817af4..eccf6d96e551

[PATCH v2 net] l2tp: do not use udp_ioctl()

2017-02-09 Thread Eric Dumazet

From: Eric Dumazet 

udp_ioctl(), as its name suggests, is used by UDP protocols,
but is also used by L2TP :(

L2TP should use its own handler, because it really does not
look the same.

SIOCINQ for instance should not assume UDP checksum or headers.

Thanks to Andrey and syzkaller team for providing the report
and a nice reproducer.

While crashes only happen on recent kernels (after commit 
7c13f97ffde6 ("udp: do fwd memory scheduling on dequeue")), this
probably needs to be backported to older kernels.

Fixes: 7c13f97ffde6 ("udp: do fwd memory scheduling on dequeue")
Fixes: 85584672012e ("udp: Fix udp_poll() and ioctl()")
Signed-off-by: Eric Dumazet 
Reported-by: Andrey Konovalov 
Acked-by: Paolo Abeni 
---
v2: Adding the EXPORT_SYMBOL(l2tp_ioctl) for ipv6, of course...

 net/l2tp/l2tp_core.h |1 +
 net/l2tp/l2tp_ip.c   |   27 ++-
 net/l2tp/l2tp_ip6.c  |2 +-
 3 files changed, 28 insertions(+), 2 deletions(-)

diff --git a/net/l2tp/l2tp_core.h b/net/l2tp/l2tp_core.h
index 
8f560f7140a05694c13904d9b171ba67d9d11292..aebf281d09eeb31c531eb624bd2ddd78cab8da9b
 100644
--- a/net/l2tp/l2tp_core.h
+++ b/net/l2tp/l2tp_core.h
@@ -263,6 +263,7 @@ int l2tp_xmit_skb(struct l2tp_session *session, struct 
sk_buff *skb,
 int l2tp_nl_register_ops(enum l2tp_pwtype pw_type,
 const struct l2tp_nl_cmd_ops *ops);
 void l2tp_nl_unregister_ops(enum l2tp_pwtype pw_type);
+int l2tp_ioctl(struct sock *sk, int cmd, unsigned long arg);
 
 /* Session reference counts. Incremented when code obtains a reference
  * to a session.
diff --git a/net/l2tp/l2tp_ip.c b/net/l2tp/l2tp_ip.c
index 
3d73278b86ca34bfbd774dc8f52e490169445e1b..28c21546d5b60dcd07bbf6347389e97c918bf40f
 100644
--- a/net/l2tp/l2tp_ip.c
+++ b/net/l2tp/l2tp_ip.c
@@ -11,6 +11,7 @@
 
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
 
+#include 
 #include 
 #include 
 #include 
@@ -553,6 +554,30 @@ static int l2tp_ip_recvmsg(struct sock *sk, struct msghdr 
*msg,
return err ? err : copied;
 }
 
+int l2tp_ioctl(struct sock *sk, int cmd, unsigned long arg)
+{
+   struct sk_buff *skb;
+   int amount;
+
+   switch (cmd) {
+   case SIOCOUTQ:
+   amount = sk_wmem_alloc_get(sk);
+   break;
+   case SIOCINQ:
+   spin_lock_bh(>sk_receive_queue.lock);
+   skb = skb_peek(>sk_receive_queue);
+   amount = skb ? skb->len : 0;
+   spin_unlock_bh(>sk_receive_queue.lock);
+   break;
+
+   default:
+   return -ENOIOCTLCMD;
+   }
+
+   return put_user(amount, (int __user *)arg);
+}
+EXPORT_SYMBOL(l2tp_ioctl);
+
 static struct proto l2tp_ip_prot = {
.name  = "L2TP/IP",
.owner = THIS_MODULE,
@@ -561,7 +586,7 @@ static struct proto l2tp_ip_prot = {
.bind  = l2tp_ip_bind,
.connect   = l2tp_ip_connect,
.disconnect= l2tp_ip_disconnect,
-   .ioctl = udp_ioctl,
+   .ioctl = l2tp_ioctl,
.destroy   = l2tp_ip_destroy_sock,
.setsockopt= ip_setsockopt,
.getsockopt= ip_getsockopt,
diff --git a/net/l2tp/l2tp_ip6.c b/net/l2tp/l2tp_ip6.c
index 
331ccf5a7bad80e011997e071489d7775b0c68c6..f47c45250f86c9189e0a6bbfd92b21cbe2069406
 100644
--- a/net/l2tp/l2tp_ip6.c
+++ b/net/l2tp/l2tp_ip6.c
@@ -722,7 +722,7 @@ static struct proto l2tp_ip6_prot = {
.bind  = l2tp_ip6_bind,
.connect   = l2tp_ip6_connect,
.disconnect= l2tp_ip6_disconnect,
-   .ioctl = udp_ioctl,
+   .ioctl = l2tp_ioctl,
.destroy   = l2tp_ip6_destroy_sock,
.setsockopt= ipv6_setsockopt,
.getsockopt= ipv6_getsockopt,

[PATCH net-next v5 07/11] bpf: Use bpf_map_delete_elem() from the library

2017-02-09 Thread Mickaël Salaün

Replace bpf_map_delete() with bpf_map_delete_elem() calls.

Signed-off-by: Mickaël Salaün 
Cc: Alexei Starovoitov 
Cc: Daniel Borkmann 
Cc: Shuah Khan 
---
 tools/lib/bpf/bpf.c|  2 +-
 tools/lib/bpf/bpf.h|  2 +-
 tools/testing/selftests/bpf/bpf_sys.h  | 10 --
 tools/testing/selftests/bpf/test_lru_map.c |  6 +++---
 tools/testing/selftests/bpf/test_maps.c| 22 +++---
 5 files changed, 16 insertions(+), 26 deletions(-)

diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index b1a1f58b99e0..eab8c6bfbf8f 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -124,7 +124,7 @@ int bpf_map_lookup_elem(int fd, const void *key, void 
*value)
return sys_bpf(BPF_MAP_LOOKUP_ELEM, , sizeof(attr));
 }
 
-int bpf_map_delete_elem(int fd, void *key)
+int bpf_map_delete_elem(int fd, const void *key)
 {
union bpf_attr attr;
 
diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
index 171cf594f782..f559f648db45 100644
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -37,7 +37,7 @@ int bpf_map_update_elem(int fd, const void *key, const void 
*value,
__u64 flags);
 
 int bpf_map_lookup_elem(int fd, const void *key, void *value);
-int bpf_map_delete_elem(int fd, void *key);
+int bpf_map_delete_elem(int fd, const void *key);
 int bpf_map_get_next_key(int fd, void *key, void *next_key);
 int bpf_obj_pin(int fd, const char *pathname);
 int bpf_obj_get(const char *pathname);
diff --git a/tools/testing/selftests/bpf/bpf_sys.h 
b/tools/testing/selftests/bpf/bpf_sys.h
index 0a5a6060db70..17581a42e1d9 100644
--- a/tools/testing/selftests/bpf/bpf_sys.h
+++ b/tools/testing/selftests/bpf/bpf_sys.h
@@ -24,16 +24,6 @@ static inline int bpf(int cmd, union bpf_attr *attr, 
unsigned int size)
 #endif
 }
 
-static inline int bpf_map_delete(int fd, const void *key)
-{
-   union bpf_attr attr = {};
-
-   attr.map_fd = fd;
-   attr.key = bpf_ptr_to_u64(key);
-
-   return bpf(BPF_MAP_DELETE_ELEM, , sizeof(attr));
-}
-
 static inline int bpf_map_next_key(int fd, const void *key, void *next_key)
 {
union bpf_attr attr = {};
diff --git a/tools/testing/selftests/bpf/test_lru_map.c 
b/tools/testing/selftests/bpf/test_lru_map.c
index eccf6d96e551..859c940a6e41 100644
--- a/tools/testing/selftests/bpf/test_lru_map.c
+++ b/tools/testing/selftests/bpf/test_lru_map.c
@@ -324,7 +324,7 @@ static void test_lru_sanity2(int map_type, int map_flags, 
unsigned int tgt_free)
if (map_type == BPF_MAP_TYPE_LRU_PERCPU_HASH) {
assert(!bpf_map_update_elem(lru_map_fd, , value,
BPF_NOEXIST));
-   assert(!bpf_map_delete(lru_map_fd, ));
+   assert(!bpf_map_delete_elem(lru_map_fd, ));
} else {
assert(bpf_map_update_elem(lru_map_fd, , value,
   BPF_EXIST));
@@ -483,8 +483,8 @@ static void test_lru_sanity4(int map_type, int map_flags, 
unsigned int tgt_free)
}
 
for (; key <= 2 * tgt_free; key++) {
-   assert(!bpf_map_delete(lru_map_fd, ));
-   assert(bpf_map_delete(lru_map_fd, ));
+   assert(!bpf_map_delete_elem(lru_map_fd, ));
+   assert(bpf_map_delete_elem(lru_map_fd, ));
}
 
end_key = key + 2 * tgt_free;
diff --git a/tools/testing/selftests/bpf/test_maps.c 
b/tools/testing/selftests/bpf/test_maps.c
index 5db1a939af69..0f9f90455375 100644
--- a/tools/testing/selftests/bpf/test_maps.c
+++ b/tools/testing/selftests/bpf/test_maps.c
@@ -86,7 +86,7 @@ static void test_hashmap(int task, void *data)
 
/* Check that key = 0 doesn't exist. */
key = 0;
-   assert(bpf_map_delete(fd, ) == -1 && errno == ENOENT);
+   assert(bpf_map_delete_elem(fd, ) == -1 && errno == ENOENT);
 
/* Iterate over two elements. */
assert(bpf_map_next_key(fd, , _key) == 0 &&
@@ -98,10 +98,10 @@ static void test_hashmap(int task, void *data)
 
/* Delete both elements. */
key = 1;
-   assert(bpf_map_delete(fd, ) == 0);
+   assert(bpf_map_delete_elem(fd, ) == 0);
key = 2;
-   assert(bpf_map_delete(fd, ) == 0);
-   assert(bpf_map_delete(fd, ) == -1 && errno == ENOENT);
+   assert(bpf_map_delete_elem(fd, ) == 0);
+   assert(bpf_map_delete_elem(fd, ) == -1 && errno == ENOENT);
 
key = 0;
/* Check that map is empty. */
@@ -172,7 +172,7 @@ static void test_hashmap_percpu(int task, void *data)
   errno == E2BIG);
 
/* Check that key = 0 doesn't exist. */
-   assert(bpf_map_delete(fd, ) == -1 && errno == ENOENT);
+   assert(bpf_map_delete_elem(fd, ) == -1 && errno == ENOENT);
 
/* Iterate over two elements. */
while (!bpf_map_next_key(fd, , _key)) {
@@ -194,10 +194,10 @@ static void test_hashmap_percpu(int task,

Re: [PATCH v4 net-next 06/10] openvswitch: Refactor labels initialization.

2017-02-09 Thread Joe Stringer

On 9 February 2017 at 11:21, Jarno Rajahalme  wrote:
> Refactoring conntrack labels initialization makes changes in later
> patches easier to review.
>
> Signed-off-by: Jarno Rajahalme 
> Acked-by: Pravin B Shelar 

Acked-by: Joe Stringer

[PATCH net-next v5 01/11] tools: Sync {,tools/}include/uapi/linux/bpf.h

2017-02-09 Thread Mickaël Salaün

The tools version of this header is out of date; update it to the latest
version from kernel header.

Synchronize with the following commits:
* b95a5c4db09b ("bpf: add a longest prefix match trie map implementation")
* a5e8c07059d0 ("bpf: add bpf_probe_read_str helper")
* d1b662adcdb8 ("bpf: allow option for setting bpf_l4_csum_replace from 
scratch")

Signed-off-by: Mickaël Salaün 
Cc: Alexei Starovoitov 
Cc: Arnaldo Carvalho de Melo 
Cc: Daniel Borkmann 
Cc: Daniel Mack 
Cc: David S. Miller 
Cc: Gianluca Borello 
---
 tools/include/uapi/linux/bpf.h | 23 ++-
 1 file changed, 22 insertions(+), 1 deletion(-)

diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 0eb0e87dbe9f..e07fd5a324e6 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -63,6 +63,12 @@ struct bpf_insn {
__s32   imm;/* signed immediate constant */
 };
 
+/* Key of an a BPF_MAP_TYPE_LPM_TRIE entry */
+struct bpf_lpm_trie_key {
+   __u32   prefixlen;  /* up to 32 for AF_INET, 128 for AF_INET6 */
+   __u8data[0];/* Arbitrary size */
+};
+
 /* BPF syscall commands, see bpf(2) man-page for details. */
 enum bpf_cmd {
BPF_MAP_CREATE,
@@ -89,6 +95,7 @@ enum bpf_map_type {
BPF_MAP_TYPE_CGROUP_ARRAY,
BPF_MAP_TYPE_LRU_HASH,
BPF_MAP_TYPE_LRU_PERCPU_HASH,
+   BPF_MAP_TYPE_LPM_TRIE,
 };
 
 enum bpf_prog_type {
@@ -430,6 +437,18 @@ union bpf_attr {
  * @xdp_md: pointer to xdp_md
  * @delta: An positive/negative integer to be added to xdp_md.data
  * Return: 0 on success or negative on error
+ *
+ * int bpf_probe_read_str(void *dst, int size, const void *unsafe_ptr)
+ * Copy a NUL terminated string from unsafe address. In case the string
+ * length is smaller than size, the target is not padded with further NUL
+ * bytes. In case the string length is larger than size, just count-1
+ * bytes are copied and the last byte is set to NUL.
+ * @dst: destination address
+ * @size: maximum number of bytes to copy, including the trailing NUL
+ * @unsafe_ptr: unsafe address
+ * Return:
+ *   > 0 length of the string including the trailing NUL on success
+ *   < 0 error
  */
 #define __BPF_FUNC_MAPPER(FN)  \
FN(unspec), \
@@ -476,7 +495,8 @@ union bpf_attr {
FN(set_hash_invalid),   \
FN(get_numa_node_id),   \
FN(skb_change_head),\
-   FN(xdp_adjust_head),
+   FN(xdp_adjust_head),\
+   FN(probe_read_str),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
@@ -502,6 +522,7 @@ enum bpf_func_id {
 /* BPF_FUNC_l4_csum_replace flags. */
 #define BPF_F_PSEUDO_HDR   (1ULL << 4)
 #define BPF_F_MARK_MANGLED_0   (1ULL << 5)
+#define BPF_F_MARK_ENFORCE (1ULL << 6)
 
 /* BPF_FUNC_clone_redirect and BPF_FUNC_redirect flags. */
 #define BPF_F_INGRESS  (1ULL << 0)
-- 
2.11.0

[PATCH net-next v5 11/11] bpf: Add test_tag to .gitignore

2017-02-09 Thread Mickaël Salaün

Signed-off-by: Mickaël Salaün 
Cc: Alexei Starovoitov 
Cc: Daniel Borkmann 
Cc: Shuah Khan 
---
 tools/testing/selftests/bpf/.gitignore | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tools/testing/selftests/bpf/.gitignore 
b/tools/testing/selftests/bpf/.gitignore
index d3b1c9bca407..541d9d7fad5a 100644
--- a/tools/testing/selftests/bpf/.gitignore
+++ b/tools/testing/selftests/bpf/.gitignore
@@ -2,3 +2,4 @@ test_verifier
 test_maps
 test_lru_map
 test_lpm_map
+test_tag
-- 
2.11.0

[PATCH net-next v5 03/11] bpf: Always test unprivileged programs

2017-02-09 Thread Mickaël Salaün

If selftests are run as root, then execute the unprivileged checks as
well. This switch from 243 to 368 tests.

The test numbers are suffixed with "/u" when executed as unprivileged or
with "/p" when executed as privileged.

The geteuid() check is replaced with a capability check.

Handling capabilities requires the libcap dependency.

Signed-off-by: Mickaël Salaün 
Acked-by: Alexei Starovoitov 
Acked-by: Daniel Borkmann 
Cc: Shuah Khan 
---
 tools/testing/selftests/bpf/Makefile|  2 +-
 tools/testing/selftests/bpf/test_verifier.c | 68 ++---
 2 files changed, 64 insertions(+), 6 deletions(-)

diff --git a/tools/testing/selftests/bpf/Makefile 
b/tools/testing/selftests/bpf/Makefile
index c470c7301636..f3d65ad53494 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -1,4 +1,4 @@
-CFLAGS += -Wall -O2 -I../../../include/uapi
+CFLAGS += -Wall -O2 -lcap -I../../../include/uapi
 
 test_objs = test_verifier test_tag test_maps test_lru_map test_lpm_map
 
diff --git a/tools/testing/selftests/bpf/test_verifier.c 
b/tools/testing/selftests/bpf/test_verifier.c
index 71f6407cde60..878bd60da376 100644
--- a/tools/testing/selftests/bpf/test_verifier.c
+++ b/tools/testing/selftests/bpf/test_verifier.c
@@ -16,6 +16,7 @@
 #include 
 #include 
 
+#include 
 #include 
 
 #include 
@@ -4574,6 +4575,55 @@ static void do_test_single(struct bpf_test *test, bool 
unpriv,
goto close_fds;
 }
 
+static bool is_admin(void)
+{
+   cap_t caps;
+   cap_flag_value_t sysadmin = CAP_CLEAR;
+   const cap_value_t cap_val = CAP_SYS_ADMIN;
+
+   if (!CAP_IS_SUPPORTED(CAP_SETFCAP)) {
+   perror("cap_get_flag");
+   return false;
+   }
+   caps = cap_get_proc();
+   if (!caps) {
+   perror("cap_get_proc");
+   return false;
+   }
+   if (cap_get_flag(caps, cap_val, CAP_EFFECTIVE, ))
+   perror("cap_get_flag");
+   if (cap_free(caps))
+   perror("cap_free");
+   return (sysadmin == CAP_SET);
+}
+
+static int set_admin(bool admin)
+{
+   cap_t caps;
+   const cap_value_t cap_val = CAP_SYS_ADMIN;
+   int ret = -1;
+
+   caps = cap_get_proc();
+   if (!caps) {
+   perror("cap_get_proc");
+   return -1;
+   }
+   if (cap_set_flag(caps, CAP_EFFECTIVE, 1, _val,
+   admin ? CAP_SET : CAP_CLEAR)) {
+   perror("cap_set_flag");
+   goto out;
+   }
+   if (cap_set_proc(caps)) {
+   perror("cap_set_proc");
+   goto out;
+   }
+   ret = 0;
+out:
+   if (cap_free(caps))
+   perror("cap_free");
+   return ret;
+}
+
 static int do_test(bool unpriv, unsigned int from, unsigned int to)
 {
int i, passes = 0, errors = 0;
@@ -4584,11 +4634,19 @@ static int do_test(bool unpriv, unsigned int from, 
unsigned int to)
/* Program types that are not supported by non-root we
 * skip right away.
 */
-   if (unpriv && test->prog_type)
-   continue;
+   if (!test->prog_type) {
+   if (!unpriv)
+   set_admin(false);
+   printf("#%d/u %s ", i, test->descr);
+   do_test_single(test, true, , );
+   if (!unpriv)
+   set_admin(true);
+   }
 
-   printf("#%d %s ", i, test->descr);
-   do_test_single(test, unpriv, , );
+   if (!unpriv) {
+   printf("#%d/p %s ", i, test->descr);
+   do_test_single(test, false, , );
+   }
}
 
printf("Summary: %d PASSED, %d FAILED\n", passes, errors);
@@ -4600,7 +4658,7 @@ int main(int argc, char **argv)
struct rlimit rinf = { RLIM_INFINITY, RLIM_INFINITY };
struct rlimit rlim = { 1 << 20, 1 << 20 };
unsigned int from = 0, to = ARRAY_SIZE(tests);
-   bool unpriv = geteuid() != 0;
+   bool unpriv = !is_admin();
 
if (argc == 3) {
unsigned int l = atoi(argv[argc - 2]);
-- 
2.11.0

[PATCH net-next v5 09/11] bpf: Use bpf_create_map() from the library

2017-02-09 Thread Mickaël Salaün

Replace bpf_map_create() with bpf_create_map() calls.

Signed-off-by: Mickaël Salaün 
Cc: Alexei Starovoitov 
Cc: Daniel Borkmann 
Cc: Shuah Khan 
---
 tools/testing/selftests/bpf/bpf_sys.h   | 15 ---
 tools/testing/selftests/bpf/test_lpm_map.c  |  6 +++---
 tools/testing/selftests/bpf/test_lru_map.c  |  4 ++--
 tools/testing/selftests/bpf/test_maps.c | 14 +++---
 tools/testing/selftests/bpf/test_tag.c  |  2 +-
 tools/testing/selftests/bpf/test_verifier.c |  4 ++--
 6 files changed, 15 insertions(+), 30 deletions(-)

diff --git a/tools/testing/selftests/bpf/bpf_sys.h 
b/tools/testing/selftests/bpf/bpf_sys.h
index aeff99f0a411..aa076a8a07f7 100644
--- a/tools/testing/selftests/bpf/bpf_sys.h
+++ b/tools/testing/selftests/bpf/bpf_sys.h
@@ -24,19 +24,4 @@ static inline int bpf(int cmd, union bpf_attr *attr, 
unsigned int size)
 #endif
 }
 
-static inline int bpf_map_create(enum bpf_map_type type, uint32_t size_key,
-uint32_t size_value, uint32_t max_elem,
-uint32_t flags)
-{
-   union bpf_attr attr = {};
-
-   attr.map_type = type;
-   attr.key_size = size_key;
-   attr.value_size = size_value;
-   attr.max_entries = max_elem;
-   attr.map_flags = flags;
-
-   return bpf(BPF_MAP_CREATE, , sizeof(attr));
-}
-
 #endif /* __BPF_SYS__ */
diff --git a/tools/testing/selftests/bpf/test_lpm_map.c 
b/tools/testing/selftests/bpf/test_lpm_map.c
index bd08394c26cb..3cc812cac2d7 100644
--- a/tools/testing/selftests/bpf/test_lpm_map.c
+++ b/tools/testing/selftests/bpf/test_lpm_map.c
@@ -183,7 +183,7 @@ static void test_lpm_map(int keysize)
key = alloca(sizeof(*key) + keysize);
memset(key, 0, sizeof(*key) + keysize);
 
-   map = bpf_map_create(BPF_MAP_TYPE_LPM_TRIE,
+   map = bpf_create_map(BPF_MAP_TYPE_LPM_TRIE,
 sizeof(*key) + keysize,
 keysize + 1,
 4096,
@@ -253,12 +253,12 @@ static void test_lpm_ipaddr(void)
key_ipv4 = alloca(key_size_ipv4);
key_ipv6 = alloca(key_size_ipv6);
 
-   map_fd_ipv4 = bpf_map_create(BPF_MAP_TYPE_LPM_TRIE,
+   map_fd_ipv4 = bpf_create_map(BPF_MAP_TYPE_LPM_TRIE,
 key_size_ipv4, sizeof(value),
 100, BPF_F_NO_PREALLOC);
assert(map_fd_ipv4 >= 0);
 
-   map_fd_ipv6 = bpf_map_create(BPF_MAP_TYPE_LPM_TRIE,
+   map_fd_ipv6 = bpf_create_map(BPF_MAP_TYPE_LPM_TRIE,
 key_size_ipv6, sizeof(value),
 100, BPF_F_NO_PREALLOC);
assert(map_fd_ipv6 >= 0);
diff --git a/tools/testing/selftests/bpf/test_lru_map.c 
b/tools/testing/selftests/bpf/test_lru_map.c
index 360f7e006eb6..48973ded1c96 100644
--- a/tools/testing/selftests/bpf/test_lru_map.c
+++ b/tools/testing/selftests/bpf/test_lru_map.c
@@ -31,11 +31,11 @@ static int create_map(int map_type, int map_flags, unsigned 
int size)
 {
int map_fd;
 
-   map_fd = bpf_map_create(map_type, sizeof(unsigned long long),
+   map_fd = bpf_create_map(map_type, sizeof(unsigned long long),
sizeof(unsigned long long), size, map_flags);
 
if (map_fd == -1)
-   perror("bpf_map_create");
+   perror("bpf_create_map");
 
return map_fd;
 }
diff --git a/tools/testing/selftests/bpf/test_maps.c 
b/tools/testing/selftests/bpf/test_maps.c
index be52c808d6cf..39168499f43f 100644
--- a/tools/testing/selftests/bpf/test_maps.c
+++ b/tools/testing/selftests/bpf/test_maps.c
@@ -32,7 +32,7 @@ static void test_hashmap(int task, void *data)
long long key, next_key, value;
int fd;
 
-   fd = bpf_map_create(BPF_MAP_TYPE_HASH, sizeof(key), sizeof(value),
+   fd = bpf_create_map(BPF_MAP_TYPE_HASH, sizeof(key), sizeof(value),
2, map_flags);
if (fd < 0) {
printf("Failed to create hashmap '%s'!\n", strerror(errno));
@@ -119,7 +119,7 @@ static void test_hashmap_percpu(int task, void *data)
int expected_key_mask = 0;
int fd, i;
 
-   fd = bpf_map_create(BPF_MAP_TYPE_PERCPU_HASH, sizeof(key),
+   fd = bpf_create_map(BPF_MAP_TYPE_PERCPU_HASH, sizeof(key),
sizeof(value[0]), 2, map_flags);
if (fd < 0) {
printf("Failed to create hashmap '%s'!\n", strerror(errno));
@@ -212,7 +212,7 @@ static void test_arraymap(int task, void *data)
int key, next_key, fd;
long long value;
 
-   fd = bpf_map_create(BPF_MAP_TYPE_ARRAY, sizeof(key), sizeof(value),
+   fd = bpf_create_map(BPF_MAP_TYPE_ARRAY, sizeof(key), sizeof(value),
2, 0);
if (fd < 0) {
printf("Failed to create arraymap '%s'!\n", strerror(errno));
@@ -266,7 +266,7 @@

[PATCH net-next v5 08/11] bpf: Use bpf_map_get_next_key() from the library

2017-02-09 Thread Mickaël Salaün

Replace bpf_map_next_key() with bpf_map_get_next_key() calls.

Signed-off-by: Mickaël Salaün 
Cc: Alexei Starovoitov 
Cc: Daniel Borkmann 
Cc: Shuah Khan 
---
 tools/lib/bpf/bpf.c|  2 +-
 tools/lib/bpf/bpf.h|  2 +-
 tools/testing/selftests/bpf/bpf_sys.h  | 11 --
 tools/testing/selftests/bpf/test_lru_map.c |  2 +-
 tools/testing/selftests/bpf/test_maps.c| 34 +++---
 5 files changed, 20 insertions(+), 31 deletions(-)

diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index eab8c6bfbf8f..f8a2b7fa7741 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -135,7 +135,7 @@ int bpf_map_delete_elem(int fd, const void *key)
return sys_bpf(BPF_MAP_DELETE_ELEM, , sizeof(attr));
 }
 
-int bpf_map_get_next_key(int fd, void *key, void *next_key)
+int bpf_map_get_next_key(int fd, const void *key, void *next_key)
 {
union bpf_attr attr;
 
diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
index f559f648db45..88f07c15423a 100644
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -38,7 +38,7 @@ int bpf_map_update_elem(int fd, const void *key, const void 
*value,
 
 int bpf_map_lookup_elem(int fd, const void *key, void *value);
 int bpf_map_delete_elem(int fd, const void *key);
-int bpf_map_get_next_key(int fd, void *key, void *next_key);
+int bpf_map_get_next_key(int fd, const void *key, void *next_key);
 int bpf_obj_pin(int fd, const char *pathname);
 int bpf_obj_get(const char *pathname);
 int bpf_prog_attach(int prog_fd, int attachable_fd, enum bpf_attach_type type);
diff --git a/tools/testing/selftests/bpf/bpf_sys.h 
b/tools/testing/selftests/bpf/bpf_sys.h
index 17581a42e1d9..aeff99f0a411 100644
--- a/tools/testing/selftests/bpf/bpf_sys.h
+++ b/tools/testing/selftests/bpf/bpf_sys.h
@@ -24,17 +24,6 @@ static inline int bpf(int cmd, union bpf_attr *attr, 
unsigned int size)
 #endif
 }
 
-static inline int bpf_map_next_key(int fd, const void *key, void *next_key)
-{
-   union bpf_attr attr = {};
-
-   attr.map_fd = fd;
-   attr.key = bpf_ptr_to_u64(key);
-   attr.next_key = bpf_ptr_to_u64(next_key);
-
-   return bpf(BPF_MAP_GET_NEXT_KEY, , sizeof(attr));
-}
-
 static inline int bpf_map_create(enum bpf_map_type type, uint32_t size_key,
 uint32_t size_value, uint32_t max_elem,
 uint32_t flags)
diff --git a/tools/testing/selftests/bpf/test_lru_map.c 
b/tools/testing/selftests/bpf/test_lru_map.c
index 859c940a6e41..360f7e006eb6 100644
--- a/tools/testing/selftests/bpf/test_lru_map.c
+++ b/tools/testing/selftests/bpf/test_lru_map.c
@@ -46,7 +46,7 @@ static int map_subset(int map0, int map1)
unsigned long long value0[nr_cpus], value1[nr_cpus];
int ret;
 
-   while (!bpf_map_next_key(map1, _key, _key)) {
+   while (!bpf_map_get_next_key(map1, _key, _key)) {
assert(!bpf_map_lookup_elem(map1, _key, value1));
ret = bpf_map_lookup_elem(map0, _key, value0);
if (ret) {
diff --git a/tools/testing/selftests/bpf/test_maps.c 
b/tools/testing/selftests/bpf/test_maps.c
index 0f9f90455375..be52c808d6cf 100644
--- a/tools/testing/selftests/bpf/test_maps.c
+++ b/tools/testing/selftests/bpf/test_maps.c
@@ -89,11 +89,11 @@ static void test_hashmap(int task, void *data)
assert(bpf_map_delete_elem(fd, ) == -1 && errno == ENOENT);
 
/* Iterate over two elements. */
-   assert(bpf_map_next_key(fd, , _key) == 0 &&
+   assert(bpf_map_get_next_key(fd, , _key) == 0 &&
   (next_key == 1 || next_key == 2));
-   assert(bpf_map_next_key(fd, _key, _key) == 0 &&
+   assert(bpf_map_get_next_key(fd, _key, _key) == 0 &&
   (next_key == 1 || next_key == 2));
-   assert(bpf_map_next_key(fd, _key, _key) == -1 &&
+   assert(bpf_map_get_next_key(fd, _key, _key) == -1 &&
   errno == ENOENT);
 
/* Delete both elements. */
@@ -105,7 +105,7 @@ static void test_hashmap(int task, void *data)
 
key = 0;
/* Check that map is empty. */
-   assert(bpf_map_next_key(fd, , _key) == -1 &&
+   assert(bpf_map_get_next_key(fd, , _key) == -1 &&
   errno == ENOENT);
 
close(fd);
@@ -175,7 +175,7 @@ static void test_hashmap_percpu(int task, void *data)
assert(bpf_map_delete_elem(fd, ) == -1 && errno == ENOENT);
 
/* Iterate over two elements. */
-   while (!bpf_map_next_key(fd, , _key)) {
+   while (!bpf_map_get_next_key(fd, , _key)) {
assert((expected_key_mask & next_key) == next_key);
expected_key_mask &= ~next_key;
 
@@ -201,7 +201,7 @@ static void test_hashmap_percpu(int task, void *data)
 
key = 0;
/* Check that map is empty. */
-   assert(bpf_map_next_key(fd, , _key) == -1 &&
+   assert(bpf_map_get_next_key(fd, , _key) == -1 &&

[PATCH net-next v5 05/11] bpf: Use bpf_map_update_elem() from the library

2017-02-09 Thread Mickaël Salaün

Replace bpf_map_update() with bpf_map_update_elem() calls.

Signed-off-by: Mickaël Salaün 
Cc: Alexei Starovoitov 
Cc: Daniel Borkmann 
Cc: Shuah Khan 
---
 tools/lib/bpf/bpf.c|  2 +-
 tools/lib/bpf/bpf.h|  2 +-
 tools/testing/selftests/bpf/bpf_sys.h  | 13 
 tools/testing/selftests/bpf/test_lpm_map.c | 15 ++---
 tools/testing/selftests/bpf/test_lru_map.c | 97 +-
 tools/testing/selftests/bpf/test_maps.c| 61 ++-
 6 files changed, 99 insertions(+), 91 deletions(-)

diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index 58ce252073fa..1de762677a2f 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -98,7 +98,7 @@ int bpf_load_program(enum bpf_prog_type type, const struct 
bpf_insn *insns,
return sys_bpf(BPF_PROG_LOAD, , sizeof(attr));
 }
 
-int bpf_map_update_elem(int fd, void *key, void *value,
+int bpf_map_update_elem(int fd, const void *key, const void *value,
__u64 flags)
 {
union bpf_attr attr;
diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
index bc959a2de023..2458534c8b33 100644
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -33,7 +33,7 @@ int bpf_load_program(enum bpf_prog_type type, const struct 
bpf_insn *insns,
 __u32 kern_version, char *log_buf,
 size_t log_buf_sz);
 
-int bpf_map_update_elem(int fd, void *key, void *value,
+int bpf_map_update_elem(int fd, const void *key, const void *value,
__u64 flags);
 
 int bpf_map_lookup_elem(int fd, void *key, void *value);
diff --git a/tools/testing/selftests/bpf/bpf_sys.h 
b/tools/testing/selftests/bpf/bpf_sys.h
index e7bbe3e5402e..e08dec0db9e0 100644
--- a/tools/testing/selftests/bpf/bpf_sys.h
+++ b/tools/testing/selftests/bpf/bpf_sys.h
@@ -35,19 +35,6 @@ static inline int bpf_map_lookup(int fd, const void *key, 
void *value)
return bpf(BPF_MAP_LOOKUP_ELEM, , sizeof(attr));
 }
 
-static inline int bpf_map_update(int fd, const void *key, const void *value,
-uint64_t flags)
-{
-   union bpf_attr attr = {};
-
-   attr.map_fd = fd;
-   attr.key = bpf_ptr_to_u64(key);
-   attr.value = bpf_ptr_to_u64(value);
-   attr.flags = flags;
-
-   return bpf(BPF_MAP_UPDATE_ELEM, , sizeof(attr));
-}
-
 static inline int bpf_map_delete(int fd, const void *key)
 {
union bpf_attr attr = {};
diff --git a/tools/testing/selftests/bpf/test_lpm_map.c 
b/tools/testing/selftests/bpf/test_lpm_map.c
index 26775c00273f..e29ffbcd2932 100644
--- a/tools/testing/selftests/bpf/test_lpm_map.c
+++ b/tools/testing/selftests/bpf/test_lpm_map.c
@@ -22,6 +22,7 @@
 #include 
 #include 
 
+#include 
 #include "bpf_sys.h"
 #include "bpf_util.h"
 
@@ -198,7 +199,7 @@ static void test_lpm_map(int keysize)
 
key->prefixlen = value[keysize];
memcpy(key->data, value, keysize);
-   r = bpf_map_update(map, key, value, 0);
+   r = bpf_map_update_elem(map, key, value, 0);
assert(!r);
}
 
@@ -266,32 +267,32 @@ static void test_lpm_ipaddr(void)
value = 1;
key_ipv4->prefixlen = 16;
inet_pton(AF_INET, "192.168.0.0", key_ipv4->data);
-   assert(bpf_map_update(map_fd_ipv4, key_ipv4, , 0) == 0);
+   assert(bpf_map_update_elem(map_fd_ipv4, key_ipv4, , 0) == 0);
 
value = 2;
key_ipv4->prefixlen = 24;
inet_pton(AF_INET, "192.168.0.0", key_ipv4->data);
-   assert(bpf_map_update(map_fd_ipv4, key_ipv4, , 0) == 0);
+   assert(bpf_map_update_elem(map_fd_ipv4, key_ipv4, , 0) == 0);
 
value = 3;
key_ipv4->prefixlen = 24;
inet_pton(AF_INET, "192.168.128.0", key_ipv4->data);
-   assert(bpf_map_update(map_fd_ipv4, key_ipv4, , 0) == 0);
+   assert(bpf_map_update_elem(map_fd_ipv4, key_ipv4, , 0) == 0);
 
value = 5;
key_ipv4->prefixlen = 24;
inet_pton(AF_INET, "192.168.1.0", key_ipv4->data);
-   assert(bpf_map_update(map_fd_ipv4, key_ipv4, , 0) == 0);
+   assert(bpf_map_update_elem(map_fd_ipv4, key_ipv4, , 0) == 0);
 
value = 4;
key_ipv4->prefixlen = 23;
inet_pton(AF_INET, "192.168.0.0", key_ipv4->data);
-   assert(bpf_map_update(map_fd_ipv4, key_ipv4, , 0) == 0);
+   assert(bpf_map_update_elem(map_fd_ipv4, key_ipv4, , 0) == 0);
 
value = 0xdeadbeef;
key_ipv6->prefixlen = 64;
inet_pton(AF_INET6, "2a00:1450:4001:814::200e", key_ipv6->data);
-   assert(bpf_map_update(map_fd_ipv6, key_ipv6, , 0) == 0);
+   assert(bpf_map_update_elem(map_fd_ipv6, key_ipv6, , 0) == 0);
 
/* Set tprefixlen to maximum for lookups */
key_ipv4->prefixlen = 32;
diff --git a/tools/testing/selftests/bpf/test_lru_map.c 
b/tools/testing/selftests/bpf/test_lru_map.c
index 9f7bd1915c21..2f61b5817af4

[PATCH net-next v5 04/11] bpf: Use bpf_load_program() from the library

2017-02-09 Thread Mickaël Salaün

Replace bpf_prog_load() with bpf_load_program() calls.

Signed-off-by: Mickaël Salaün 
Cc: Alexei Starovoitov 
Cc: Daniel Borkmann 
Cc: Shuah Khan 
---
 tools/lib/bpf/bpf.c |  6 +++---
 tools/lib/bpf/bpf.h |  4 ++--
 tools/testing/selftests/bpf/Makefile|  4 +++-
 tools/testing/selftests/bpf/bpf_sys.h   | 21 -
 tools/testing/selftests/bpf/test_tag.c  |  6 --
 tools/testing/selftests/bpf/test_verifier.c |  8 +---
 6 files changed, 17 insertions(+), 32 deletions(-)

diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index 3ddb58a36d3c..58ce252073fa 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -42,7 +42,7 @@
 # endif
 #endif
 
-static __u64 ptr_to_u64(void *ptr)
+static __u64 ptr_to_u64(const void *ptr)
 {
return (__u64) (unsigned long) ptr;
 }
@@ -69,8 +69,8 @@ int bpf_create_map(enum bpf_map_type map_type, int key_size,
return sys_bpf(BPF_MAP_CREATE, , sizeof(attr));
 }
 
-int bpf_load_program(enum bpf_prog_type type, struct bpf_insn *insns,
-size_t insns_cnt, char *license,
+int bpf_load_program(enum bpf_prog_type type, const struct bpf_insn *insns,
+size_t insns_cnt, const char *license,
 __u32 kern_version, char *log_buf, size_t log_buf_sz)
 {
int fd;
diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
index a2f9853dd882..bc959a2de023 100644
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -28,8 +28,8 @@ int bpf_create_map(enum bpf_map_type map_type, int key_size, 
int value_size,
 
 /* Recommend log buffer size */
 #define BPF_LOG_BUF_SIZE 65536
-int bpf_load_program(enum bpf_prog_type type, struct bpf_insn *insns,
-size_t insns_cnt, char *license,
+int bpf_load_program(enum bpf_prog_type type, const struct bpf_insn *insns,
+size_t insns_cnt, const char *license,
 __u32 kern_version, char *log_buf,
 size_t log_buf_sz);
 
diff --git a/tools/testing/selftests/bpf/Makefile 
b/tools/testing/selftests/bpf/Makefile
index f3d65ad53494..a35f564f66a1 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -1,4 +1,4 @@
-CFLAGS += -Wall -O2 -lcap -I../../../include/uapi
+CFLAGS += -Wall -O2 -lcap -I../../../include/uapi -I../../../lib
 
 test_objs = test_verifier test_tag test_maps test_lru_map test_lpm_map
 
@@ -7,6 +7,8 @@ TEST_FILES := $(test_objs)
 
 all: $(test_objs)
 
+$(test_objs): ../../../lib/bpf/bpf.o
+
 include ../lib.mk
 
 clean:
diff --git a/tools/testing/selftests/bpf/bpf_sys.h 
b/tools/testing/selftests/bpf/bpf_sys.h
index 6b4565f2a3f2..e7bbe3e5402e 100644
--- a/tools/testing/selftests/bpf/bpf_sys.h
+++ b/tools/testing/selftests/bpf/bpf_sys.h
@@ -84,25 +84,4 @@ static inline int bpf_map_create(enum bpf_map_type type, 
uint32_t size_key,
return bpf(BPF_MAP_CREATE, , sizeof(attr));
 }
 
-static inline int bpf_prog_load(enum bpf_prog_type type,
-   const struct bpf_insn *insns, size_t size_insns,
-   const char *license, char *log, size_t size_log)
-{
-   union bpf_attr attr = {};
-
-   attr.prog_type = type;
-   attr.insns = bpf_ptr_to_u64(insns);
-   attr.insn_cnt = size_insns / sizeof(struct bpf_insn);
-   attr.license = bpf_ptr_to_u64(license);
-
-   if (size_log > 0) {
-   attr.log_buf = bpf_ptr_to_u64(log);
-   attr.log_size = size_log;
-   attr.log_level = 1;
-   log[0] = 0;
-   }
-
-   return bpf(BPF_PROG_LOAD, , sizeof(attr));
-}
-
 #endif /* __BPF_SYS__ */
diff --git a/tools/testing/selftests/bpf/test_tag.c 
b/tools/testing/selftests/bpf/test_tag.c
index 5f7c602f47d1..dc209721ffd5 100644
--- a/tools/testing/selftests/bpf/test_tag.c
+++ b/tools/testing/selftests/bpf/test_tag.c
@@ -16,6 +16,8 @@
 #include 
 #include 
 
+#include 
+
 #include "../../../include/linux/filter.h"
 
 #include "bpf_sys.h"
@@ -55,8 +57,8 @@ static int bpf_try_load_prog(int insns, int fd_map,
int fd_prog;
 
bpf_filler(insns, fd_map);
-   fd_prog = bpf_prog_load(BPF_PROG_TYPE_SCHED_CLS, prog, insns *
-   sizeof(struct bpf_insn), "", NULL, 0);
+   fd_prog = bpf_load_program(BPF_PROG_TYPE_SCHED_CLS, prog, insns, "", 0,
+  NULL, 0);
assert(fd_prog > 0);
if (fd_map > 0)
bpf_filler(insns, 0);
diff --git a/tools/testing/selftests/bpf/test_verifier.c 
b/tools/testing/selftests/bpf/test_verifier.c
index 878bd60da376..247830ecf68e 100644
--- a/tools/testing/selftests/bpf/test_verifier.c
+++ b/tools/testing/selftests/bpf/test_verifier.c
@@ -24,6 +24,8 @@
 #include 
 #include 
 
+#include 
+
 #include "../../../include/linux/filter.h"
 
 #include "bpf_sys.h"
@@ -4535,9 +4537,9 @@ static

[PATCH net-next v5 00/11] Improve BPF selftests and use the library (net-next tree)

2017-02-09 Thread Mickaël Salaün

This series brings some fixes to selftests, add the ability to test
unprivileged BPF programs as root and replace bpf_sys.h with calls to the BPF
library.

This is intended for the net-next tree and apply on c0e4dadb3494 ("net: dsa:
mv88e6xxx: Move forward declaration to where it is needed").

Changes since v4:
* align text for function calls as requested by Daniel Borkmann
  (bpf_load_program and bpf_map_update_elem)
* rebase

Changes since v3:
* keep the bzero() calls

Changes since v2:
* use the patches from two previous series (unprivileged tests and bpf_sys.h
  replacement)
* include one more stdint.h
* rebase on net-next
* add this cover letter

Changes since v1:
* exclude patches not intended for the net-next tree

Regards,

Mickaël Salaün (11):
  tools: Sync {,tools/}include/uapi/linux/bpf.h
  bpf: Change the include directory for selftest
  bpf: Always test unprivileged programs
  bpf: Use bpf_load_program() from the library
  bpf: Use bpf_map_update_elem() from the library
  bpf: Use bpf_map_lookup_elem() from the library
  bpf: Use bpf_map_delete_elem() from the library
  bpf: Use bpf_map_get_next_key() from the library
  bpf: Use bpf_create_map() from the library
  bpf: Remove bpf_sys.h from selftests
  bpf: Add test_tag to .gitignore

 tools/include/uapi/linux/bpf.h  |  23 +++-
 tools/lib/bpf/bpf.c |  20 ++--
 tools/lib/bpf/bpf.h |  12 +--
 tools/testing/selftests/bpf/.gitignore  |   1 +
 tools/testing/selftests/bpf/Makefile|   4 +-
 tools/testing/selftests/bpf/bpf_sys.h   | 108 ---
 tools/testing/selftests/bpf/test_lpm_map.c  |  38 +++
 tools/testing/selftests/bpf/test_lru_map.c  | 138 +---
 tools/testing/selftests/bpf/test_maps.c | 162 ++--
 tools/testing/selftests/bpf/test_tag.c  |  11 +-
 tools/testing/selftests/bpf/test_verifier.c |  84 ---
 11 files changed, 301 insertions(+), 300 deletions(-)
 delete mode 100644 tools/testing/selftests/bpf/bpf_sys.h

-- 
2.11.0

[PATCH net-next v5 02/11] bpf: Change the include directory for selftest

2017-02-09 Thread Mickaël Salaün

Use the tools include directory instead of the installed one to allow
builds from other kernels.

Signed-off-by: Mickaël Salaün 
Cc: Alexei Starovoitov 
Cc: Arnaldo Carvalho de Melo 
Cc: Daniel Borkmann 
Cc: David S. Miller 
---
 tools/testing/selftests/bpf/Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/bpf/Makefile 
b/tools/testing/selftests/bpf/Makefile
index 769a6cb42b4b..c470c7301636 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -1,4 +1,4 @@
-CFLAGS += -Wall -O2 -I../../../../usr/include
+CFLAGS += -Wall -O2 -I../../../include/uapi
 
 test_objs = test_verifier test_tag test_maps test_lru_map test_lpm_map
 
-- 
2.11.0

Re: [PATCH 1/2] net: ethernet: ucc_geth: fix MEM_PART_MURAM mode

2017-02-09 Thread Li Yang

On Tue, Feb 7, 2017 at 3:05 AM, Christophe Leroy
 wrote:
> Since commit 5093bb965a163 ("powerpc/QE: switch to the cpm_muram
> implementation"), muram area is not part of immrbar mapping anymore
> so immrbar_virt_to_phys() is not usable anymore.
>
> Fixes: 5093bb965a163 ("powerpc/QE: switch to the cpm_muram implementation)
> Signed-off-by: Christophe Leroy 

Acked-by: Li Yang 

Regards,
Leo

Re: [PATCH 0/2] net: ethernet: ti: cpsw: fix susp/resume

2017-02-09 Thread Ivan Khoronzhuk

On Thu, Feb 09, 2017 at 05:21:26PM -0500, David Miller wrote:
> From: Ivan Khoronzhuk 
> Date: Thu,  9 Feb 2017 02:07:34 +0200
> 
> > These two patches fix suspend/resume chain.
> 
> Patch 2 doesn't apply cleanly to the 'net' tree, please
> respin this series.

Strange, I've just checked it on net-next/master, it was applied w/o any
warnings.

Re: [PATCH 2/2] soc/fsl/qe: get rid of immrbar_virt_to_phys()

2017-02-09 Thread Li Yang

On Tue, Feb 7, 2017 at 3:05 AM, Christophe Leroy
 wrote:
> immrbar_virt_to_phys() is not used anymore
>
> Signed-off-by: Christophe Leroy 

Acked-by: Li Yang 

Regards,
Leo

Re: [PATCH RFC v2 1/8] xdp: Infrastructure to generalize XDP

2017-02-09 Thread Tom Herbert

On Thu, Feb 9, 2017 at 2:34 PM, David Miller  wrote:
> From: Tom Herbert 
> Date: Thu, 9 Feb 2017 14:26:50 -0800
>
>> On Thu, Feb 9, 2017 at 2:17 PM, David Miller  wrote:
>>> From: Tom Herbert 
>>> Date: Wed, 8 Feb 2017 15:41:20 -0800
>>>
 These hooks are also generic to allow for XDP/BPF programs as well
 as non-BPF code (e.g. kernel code can be written in a module).
>>>
>>> I don't think we should even remotely consider surrendering the XDP
>>> hook to module code.
>>>
>>> We restrict it to eBPF for a reason, because that framework is
>>> restricted in what it can do, what it can access, and how it can do
>>> so.
>>>
>> Kernel modules go through extensive netdev review before they are
>> taken into the kernel, for BPF programs we just allow what any user
>> gives us without any peer review even implied.
>
> We can actually control what externally written XDP eBPF programs can
> do, for kernel modules we have no such control or influence.  This
> hook runs right in the driver and bypasses the entire stack, it has to
> execute in a hardened thing that cannot crash and it will not as long
> as BPF verifier is correct.
>
> And you're going to make it even more complicated what XDP offload in
> hardware actually means.  With eBPF it is very clearly defined what
> the necessary execution engine is.
>
> Tom I'm strongly against being allowed to run arbitrary module code
> from the XDP hook, sorry.
>
Okay, how about this... I'll add a configuration option like
XDP_ALLOW_OTHER_HOOKS. The default will be to disallow setting any
hook other than a BPF. If it is set, then we'll accept other hooks to
be run. This way mostly restrict the interface by default, but still
allow experimentation with other hook types like I need with TXDP or
maybe the netfilter guys might want to fastpath netfilter etc. When we
we bring a working robust implementation to netdev that show clear
benefits then we can add those to BPF as the "allowed" hooks at that
time. So this strictly controls the interfaces, but still also allows
room for innovation.

Tom

> It is as important as the distinction between full stack offload and
> partial offload in those nice charts in your talks. :-)
>

[PATCH] net: natsemi: use new api ethtool_{get|set}_link_ksettings

2017-02-09 Thread Philippe Reynes

The ethtool api {get|set}_settings is deprecated.
We move this driver to new api {get|set}_link_ksettings.

As I don't have the hardware, I'd be very pleased if
someone may test this patch.

Signed-off-by: Philippe Reynes 
---
 drivers/net/ethernet/natsemi/natsemi.c |  119 ++--
 1 files changed, 67 insertions(+), 52 deletions(-)

diff --git a/drivers/net/ethernet/natsemi/natsemi.c 
b/drivers/net/ethernet/natsemi/natsemi.c
index 8e72679..18af2a2 100644
--- a/drivers/net/ethernet/natsemi/natsemi.c
+++ b/drivers/net/ethernet/natsemi/natsemi.c
@@ -640,8 +640,10 @@ struct netdev_private {
 static int netdev_get_wol(struct net_device *dev, u32 *supported, u32 *cur);
 static int netdev_set_sopass(struct net_device *dev, u8 *newval);
 static int netdev_get_sopass(struct net_device *dev, u8 *data);
-static int netdev_get_ecmd(struct net_device *dev, struct ethtool_cmd *ecmd);
-static int netdev_set_ecmd(struct net_device *dev, struct ethtool_cmd *ecmd);
+static int netdev_get_ecmd(struct net_device *dev,
+  struct ethtool_link_ksettings *ecmd);
+static int netdev_set_ecmd(struct net_device *dev,
+  const struct ethtool_link_ksettings *ecmd);
 static void enable_wol_mode(struct net_device *dev, int enable_intr);
 static int netdev_close(struct net_device *dev);
 static int netdev_get_regs(struct net_device *dev, u8 *buf);
@@ -2584,7 +2586,8 @@ static int get_eeprom_len(struct net_device *dev)
return np->eeprom_size;
 }
 
-static int get_settings(struct net_device *dev, struct ethtool_cmd *ecmd)
+static int get_link_ksettings(struct net_device *dev,
+ struct ethtool_link_ksettings *ecmd)
 {
struct netdev_private *np = netdev_priv(dev);
spin_lock_irq(>lock);
@@ -2593,7 +2596,8 @@ static int get_settings(struct net_device *dev, struct 
ethtool_cmd *ecmd)
return 0;
 }
 
-static int set_settings(struct net_device *dev, struct ethtool_cmd *ecmd)
+static int set_link_ksettings(struct net_device *dev,
+ const struct ethtool_link_ksettings *ecmd)
 {
struct netdev_private *np = netdev_priv(dev);
int res;
@@ -2689,8 +2693,6 @@ static int get_eeprom(struct net_device *dev, struct 
ethtool_eeprom *eeprom, u8
.get_drvinfo = get_drvinfo,
.get_regs_len = get_regs_len,
.get_eeprom_len = get_eeprom_len,
-   .get_settings = get_settings,
-   .set_settings = set_settings,
.get_wol = get_wol,
.set_wol = set_wol,
.get_regs = get_regs,
@@ -2699,6 +2701,8 @@ static int get_eeprom(struct net_device *dev, struct 
ethtool_eeprom *eeprom, u8
.nway_reset = nway_reset,
.get_link = get_link,
.get_eeprom = get_eeprom,
+   .get_link_ksettings = get_link_ksettings,
+   .set_link_ksettings = set_link_ksettings,
 };
 
 static int netdev_set_wol(struct net_device *dev, u32 newval)
@@ -2828,29 +2832,32 @@ static int netdev_get_sopass(struct net_device *dev, u8 
*data)
return 0;
 }
 
-static int netdev_get_ecmd(struct net_device *dev, struct ethtool_cmd *ecmd)
+static int netdev_get_ecmd(struct net_device *dev,
+  struct ethtool_link_ksettings *ecmd)
 {
struct netdev_private *np = netdev_priv(dev);
+   u32 supported, advertising;
u32 tmp;
 
-   ecmd->port= dev->if_port;
-   ethtool_cmd_speed_set(ecmd, np->speed);
-   ecmd->duplex  = np->duplex;
-   ecmd->autoneg = np->autoneg;
-   ecmd->advertising = 0;
+   ecmd->base.port   = dev->if_port;
+   ecmd->base.speed  = np->speed;
+   ecmd->base.duplex = np->duplex;
+   ecmd->base.autoneg = np->autoneg;
+   advertising = 0;
+
if (np->advertising & ADVERTISE_10HALF)
-   ecmd->advertising |= ADVERTISED_10baseT_Half;
+   advertising |= ADVERTISED_10baseT_Half;
if (np->advertising & ADVERTISE_10FULL)
-   ecmd->advertising |= ADVERTISED_10baseT_Full;
+   advertising |= ADVERTISED_10baseT_Full;
if (np->advertising & ADVERTISE_100HALF)
-   ecmd->advertising |= ADVERTISED_100baseT_Half;
+   advertising |= ADVERTISED_100baseT_Half;
if (np->advertising & ADVERTISE_100FULL)
-   ecmd->advertising |= ADVERTISED_100baseT_Full;
-   ecmd->supported   = (SUPPORTED_Autoneg |
+   advertising |= ADVERTISED_100baseT_Full;
+   supported   = (SUPPORTED_Autoneg |
SUPPORTED_10baseT_Half  | SUPPORTED_10baseT_Full  |
SUPPORTED_100baseT_Half | SUPPORTED_100baseT_Full |
SUPPORTED_TP | SUPPORTED_MII | SUPPORTED_FIBRE);
-   ecmd->phy_address = np->phy_addr_external;
+   ecmd->base.phy_address = np->phy_addr_external;
/*
 * We intentionally report the phy address of the external
 * phy, even if the internal phy is used. This is necessary
@@

Re: [PATCHv2 net-next] net: dsa: mv88e6xxx: Move forward declaration to where it is needed

2017-02-09 Thread David Miller

From: Andrew Lunn 
Date: Thu,  9 Feb 2017 00:00:43 +0100

> Move it out from the middle for the #defines to just before it is
> needed.
> 
> Signed-off-by: Andrew Lunn 
> Reviewed-by: Vivien Didelot 

Applied, thanks for respinning.

Re: [PATCH RFC v2 1/8] xdp: Infrastructure to generalize XDP

2017-02-09 Thread Tom Herbert

On Thu, Feb 9, 2017 at 2:34 PM, David Miller  wrote:
> From: Tom Herbert 
> Date: Thu, 9 Feb 2017 14:26:50 -0800
>
>> On Thu, Feb 9, 2017 at 2:17 PM, David Miller  wrote:
>>> From: Tom Herbert 
>>> Date: Wed, 8 Feb 2017 15:41:20 -0800
>>>
 These hooks are also generic to allow for XDP/BPF programs as well
 as non-BPF code (e.g. kernel code can be written in a module).
>>>
>>> I don't think we should even remotely consider surrendering the XDP
>>> hook to module code.
>>>
>>> We restrict it to eBPF for a reason, because that framework is
>>> restricted in what it can do, what it can access, and how it can do
>>> so.
>>>
>> Kernel modules go through extensive netdev review before they are
>> taken into the kernel, for BPF programs we just allow what any user
>> gives us without any peer review even implied.
>
> We can actually control what externally written XDP eBPF programs can
> do, for kernel modules we have no such control or influence.  This
> hook runs right in the driver and bypasses the entire stack, it has to
> execute in a hardened thing that cannot crash and it will not as long
> as BPF verifier is correct.
>
> And you're going to make it even more complicated what XDP offload in
> hardware actually means.  With eBPF it is very clearly defined what
> the necessary execution engine is.
>
> Tom I'm strongly against being allowed to run arbitrary module code
> from the XDP hook, sorry.
>
> It is as important as the distinction between full stack offload and
> partial offload in those nice charts in your talks. :-)
>
Yes it is. And the relevant principle that I would draw from that is
the "offload" means offloading functionality from the kernel **to**
the device. Restricting what we implement in the kernel on the basis
of whether or not it can be offloaded to a device is completely
backwards in this regard.

Tom

[PATCH 1/1] net: ethernet: intel: e1000: msleep() is unreliable for anything <20ms

2017-02-09 Thread Saber Rezvani

Fix the checkpatch.pl issue:
WARNING: msleep < 20ms can sleep for up to 20ms; see
Documentation/timers/timers-howto.txt

Signed-off-by: Saber Rezvani 
---
 drivers/net/ethernet/intel/e1000/e1000_main.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c 
b/drivers/net/ethernet/intel/e1000/e1000_main.c
index 93fc6c6..5403fa2 100644
--- a/drivers/net/ethernet/intel/e1000/e1000_main.c
+++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
@@ -489,7 +489,7 @@ static void e1000_power_down_phy(struct e1000_adapter 
*adapter)
e1000_read_phy_reg(hw, PHY_CTRL, _reg);
mii_reg |= MII_CR_POWER_DOWN;
e1000_write_phy_reg(hw, PHY_CTRL, mii_reg);
-   msleep(1);
+   usleep_range(1000, 5000);
}
 out:
return;
@@ -536,7 +536,7 @@ void e1000_down(struct e1000_adapter *adapter)
ew32(TCTL, tctl);
/* flush both disables and wait for them to finish */
E1000_WRITE_FLUSH();
-   msleep(10);
+   usleep_range(1000, 5000);
 
napi_disable(>napi);
 
@@ -560,7 +560,7 @@ void e1000_reinit_locked(struct e1000_adapter *adapter)
 {
WARN_ON(in_interrupt());
while (test_and_set_bit(__E1000_RESETTING, >flags))
-   msleep(1);
+   usleep_range(1000, 5000);
e1000_down(adapter);
e1000_up(adapter);
clear_bit(__E1000_RESETTING, >flags);
@@ -3569,7 +3569,7 @@ static int e1000_change_mtu(struct net_device *netdev, 
int new_mtu)
}
 
while (test_and_set_bit(__E1000_RESETTING, >flags))
-   msleep(1);
+   usleep_range(1000, 5000);
/* e1000_down has a dependency on max_frame_size */
hw->max_frame_size = max_frame;
if (netif_running(netdev)) {
-- 
2.7.4

Re: [PATCH RFC v2 1/8] xdp: Infrastructure to generalize XDP

2017-02-09 Thread David Miller

From: Tom Herbert 
Date: Thu, 9 Feb 2017 14:26:50 -0800

> On Thu, Feb 9, 2017 at 2:17 PM, David Miller  wrote:
>> From: Tom Herbert 
>> Date: Wed, 8 Feb 2017 15:41:20 -0800
>>
>>> These hooks are also generic to allow for XDP/BPF programs as well
>>> as non-BPF code (e.g. kernel code can be written in a module).
>>
>> I don't think we should even remotely consider surrendering the XDP
>> hook to module code.
>>
>> We restrict it to eBPF for a reason, because that framework is
>> restricted in what it can do, what it can access, and how it can do
>> so.
>>
> Kernel modules go through extensive netdev review before they are
> taken into the kernel, for BPF programs we just allow what any user
> gives us without any peer review even implied.

We can actually control what externally written XDP eBPF programs can
do, for kernel modules we have no such control or influence.  This
hook runs right in the driver and bypasses the entire stack, it has to
execute in a hardened thing that cannot crash and it will not as long
as BPF verifier is correct.

And you're going to make it even more complicated what XDP offload in
hardware actually means.  With eBPF it is very clearly defined what
the necessary execution engine is.

Tom I'm strongly against being allowed to run arbitrary module code
from the XDP hook, sorry.

It is as important as the distinction between full stack offload and
partial offload in those nice charts in your talks. :-)

Re: [PATCH 0/2] net: ethernet: ti: cpsw: fix susp/resume

2017-02-09 Thread David Miller

From: Ivan Khoronzhuk 
Date: Thu,  9 Feb 2017 02:07:34 +0200

> These two patches fix suspend/resume chain.

Patch 2 doesn't apply cleanly to the 'net' tree, please
respin this series.

Re: [PATCH RFC v2 1/8] xdp: Infrastructure to generalize XDP

2017-02-09 Thread Tom Herbert

On Thu, Feb 9, 2017 at 2:17 PM, David Miller  wrote:
> From: Tom Herbert 
> Date: Wed, 8 Feb 2017 15:41:20 -0800
>
>> These hooks are also generic to allow for XDP/BPF programs as well
>> as non-BPF code (e.g. kernel code can be written in a module).
>
> I don't think we should even remotely consider surrendering the XDP
> hook to module code.
>
> We restrict it to eBPF for a reason, because that framework is
> restricted in what it can do, what it can access, and how it can do
> so.
>
Kernel modules go through extensive netdev review before they are
taken into the kernel, for BPF programs we just allow what any user
gives us without any peer review even implied. For this reason, I
simply don't believe that BPF is magically more robust code than what
is in a kernel module. Or to put it another way, do you think DPDK is
going to put any restrictions on what a user can do over raw queues in
userspace? If we put on artificial limits on XDP like it can only ever
be BPF then we are just closing the door to its full potential and
given more fodder for the userpace stacks to claim superiority.

> Tom if you're going to do a cleanup that makes it so that drivers
> need less code to support XDP, that is awesome but please do only
> that.
>
> Don't combine it with more controversial changes.
>
We need this for TXDP; I have no interest in rewriting the TCP stack in BPF :-)

Tom

> Thank you.

[PATCH 1/1] net: ethernet: intel: e1000: add space after close brace

2017-02-09 Thread Saber Rezvani

Fixes checkpatch.pl error:
ERROR: space required after that close brace '}'

Signed-off-by: Saber Rezvani 
---
 drivers/net/ethernet/intel/e1000/e1000_param.c | 28 +-
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000/e1000_param.c 
b/drivers/net/ethernet/intel/e1000/e1000_param.c
index c9cde35..4a8a38a 100644
--- a/drivers/net/ethernet/intel/e1000/e1000_param.c
+++ b/drivers/net/ethernet/intel/e1000/e1000_param.c
@@ -291,7 +291,7 @@ void e1000_check_options(struct e1000_adapter *adapter)
.arg  = { .r = {
.min = E1000_MIN_TXD,
.max = mac_type < e1000_82544 ? E1000_MAX_TXD : 
E1000_MAX_82544_TXD
-   }}
+   } }
};
 
if (num_TxDescriptors > bd) {
@@ -320,7 +320,7 @@ void e1000_check_options(struct e1000_adapter *adapter)
.min = E1000_MIN_RXD,
.max = mac_type < e1000_82544 ? E1000_MAX_RXD :
   E1000_MAX_82544_RXD
-   }}
+   } }
};
 
if (num_RxDescriptors > bd) {
@@ -366,7 +366,7 @@ void e1000_check_options(struct e1000_adapter *adapter)
.err  = "reading default settings from EEPROM",
.def  = E1000_FC_DEFAULT,
.arg  = { .l = { .nr = ARRAY_SIZE(fc_list),
-.p = fc_list }}
+.p = fc_list } }
};
 
if (num_FlowControl > bd) {
@@ -384,7 +384,7 @@ void e1000_check_options(struct e1000_adapter *adapter)
.err  = "using default of " 
__MODULE_STRING(DEFAULT_TIDV),
.def  = DEFAULT_TIDV,
.arg  = { .r = { .min = MIN_TXDELAY,
-.max = MAX_TXDELAY }}
+.max = MAX_TXDELAY } }
};
 
if (num_TxIntDelay > bd) {
@@ -402,7 +402,7 @@ void e1000_check_options(struct e1000_adapter *adapter)
.err  = "using default of " 
__MODULE_STRING(DEFAULT_TADV),
.def  = DEFAULT_TADV,
.arg  = { .r = { .min = MIN_TXABSDELAY,
-.max = MAX_TXABSDELAY }}
+.max = MAX_TXABSDELAY } }
};
 
if (num_TxAbsIntDelay > bd) {
@@ -420,7 +420,7 @@ void e1000_check_options(struct e1000_adapter *adapter)
.err  = "using default of " 
__MODULE_STRING(DEFAULT_RDTR),
.def  = DEFAULT_RDTR,
.arg  = { .r = { .min = MIN_RXDELAY,
-.max = MAX_RXDELAY }}
+.max = MAX_RXDELAY } }
};
 
if (num_RxIntDelay > bd) {
@@ -438,7 +438,7 @@ void e1000_check_options(struct e1000_adapter *adapter)
.err  = "using default of " 
__MODULE_STRING(DEFAULT_RADV),
.def  = DEFAULT_RADV,
.arg  = { .r = { .min = MIN_RXABSDELAY,
-.max = MAX_RXABSDELAY }}
+.max = MAX_RXABSDELAY } }
};
 
if (num_RxAbsIntDelay > bd) {
@@ -456,7 +456,7 @@ void e1000_check_options(struct e1000_adapter *adapter)
.err  = "using default of " 
__MODULE_STRING(DEFAULT_ITR),
.def  = DEFAULT_ITR,
.arg  = { .r = { .min = MIN_ITR,
-.max = MAX_ITR }}
+.max = MAX_ITR } }
};
 
if (num_InterruptThrottleRate > bd) {
@@ -570,7 +570,7 @@ static void e1000_check_copper_options(struct e1000_adapter 
*adapter)
{  0, "" },
{   SPEED_10, "" },
{  SPEED_100, "" },
-   { SPEED_1000, "" }};
+   { SPEED_1000, "" } };
 
opt = (struct e1000_option) {
.type = list_option,
@@ -578,7 +578,7 @@ static void e1000_check_copper_options(struct e1000_adapter 
*adapter)
.err  = "parameter ignored",
.def  = 0,
.arg  = { .l = { .nr = ARRAY_SIZE(speed_list),
-.p = speed_list }}
+.p = speed_list } }
};
 
if (num_Speed > bd) {
@@ -592,7 +592,7 @@ static void e1000_check_copper_options(struct e1000_adapter 
*adapter)

Re: [PATCH 0/3 v2 net-next] enic: add vxlan offload support

2017-02-09 Thread David Miller

From: Govindarajulu Varadarajan 
Date: Wed,  8 Feb 2017 16:43:06 -0800

> This series adds vxlan offload support for enic driver. The first
> patch adds vxlan devcmd for configuring vxland offload parameters.
> Second patch adds ndo_udp_tunnel_add/del and offload on rx path.
> There are to modes in which fw supports vxlan offload.
> 
> mode 0: fcoe bit is set for encapsulated packet. fcoe_fc_crc_ok is set
> if checksum of csum is ok. This bit is or of ip_csum_ok and
> tcp_udp_csum_ok
> 
> mode 2: BIT(0) in rss_hash is set if it is encapsulated packet.
> BIT(1) is set if outer_ip_csum_ok/
> BIT(2) is set if outer_tcp_csum_ok
> 
> Some hw supports only mode 0, some support mode 0 and 2. Driver gets
> the supported modes bitmap using get_supported_feature_ver devcmd
> and selects the highest mode both driver and fw supports.
> 
> Third patch adds offload support on tx path by adding
> enic_features_check().
> 
> v2: Order local variable declarations from longest to shortest line,
> on all three patches.

Series applied to net-next, thanks.

[PATCH] net: myricom: myri10ge: use new api ethtool_{get|set}_link_ksettings

2017-02-09 Thread Philippe Reynes

The ethtool api {get|set}_settings is deprecated.
We move this driver to new api {get|set}_link_ksettings.

As I don't have the hardware, I'd be very pleased if
someone may test this patch.

Signed-off-by: Philippe Reynes 
---
 drivers/net/ethernet/myricom/myri10ge/myri10ge.c |   23 +
 1 files changed, 10 insertions(+), 13 deletions(-)

diff --git a/drivers/net/ethernet/myricom/myri10ge/myri10ge.c 
b/drivers/net/ethernet/myricom/myri10ge/myri10ge.c
index 1139d18..b171ed2 100644
--- a/drivers/net/ethernet/myricom/myri10ge/myri10ge.c
+++ b/drivers/net/ethernet/myricom/myri10ge/myri10ge.c
@@ -1610,15 +1610,16 @@ static irqreturn_t myri10ge_intr(int irq, void *arg)
 }
 
 static int
-myri10ge_get_settings(struct net_device *netdev, struct ethtool_cmd *cmd)
+myri10ge_get_link_ksettings(struct net_device *netdev,
+   struct ethtool_link_ksettings *cmd)
 {
struct myri10ge_priv *mgp = netdev_priv(netdev);
char *ptr;
int i;
 
-   cmd->autoneg = AUTONEG_DISABLE;
-   ethtool_cmd_speed_set(cmd, SPEED_1);
-   cmd->duplex = DUPLEX_FULL;
+   cmd->base.autoneg = AUTONEG_DISABLE;
+   cmd->base.speed = SPEED_1;
+   cmd->base.duplex = DUPLEX_FULL;
 
/*
 * parse the product code to deterimine the interface type
@@ -1643,16 +1644,12 @@ static irqreturn_t myri10ge_intr(int irq, void *arg)
ptr++;
if (*ptr == 'R' || *ptr == 'Q' || *ptr == 'S') {
/* We've found either an XFP, quad ribbon fiber, or SFP+ */
-   cmd->port = PORT_FIBRE;
-   cmd->supported |= SUPPORTED_FIBRE;
-   cmd->advertising |= ADVERTISED_FIBRE;
+   cmd->base.port = PORT_FIBRE;
+   ethtool_link_ksettings_add_link_mode(cmd, supported, FIBRE);
+   ethtool_link_ksettings_add_link_mode(cmd, advertising, FIBRE);
} else {
-   cmd->port = PORT_OTHER;
+   cmd->base.port = PORT_OTHER;
}
-   if (*ptr == 'R' || *ptr == 'S')
-   cmd->transceiver = XCVR_EXTERNAL;
-   else
-   cmd->transceiver = XCVR_INTERNAL;
 
return 0;
 }
@@ -1925,7 +1922,6 @@ static int myri10ge_led(struct myri10ge_priv *mgp, int on)
 }
 
 static const struct ethtool_ops myri10ge_ethtool_ops = {
-   .get_settings = myri10ge_get_settings,
.get_drvinfo = myri10ge_get_drvinfo,
.get_coalesce = myri10ge_get_coalesce,
.set_coalesce = myri10ge_set_coalesce,
@@ -1939,6 +1935,7 @@ static int myri10ge_led(struct myri10ge_priv *mgp, int on)
.set_msglevel = myri10ge_set_msglevel,
.get_msglevel = myri10ge_get_msglevel,
.set_phys_id = myri10ge_phys_id,
+   .get_link_ksettings = myri10ge_get_link_ksettings,
 };
 
 static int myri10ge_allocate_rings(struct myri10ge_slice_state *ss)
-- 
1.7.4.4

Re: [PATCH RFC v2 1/8] xdp: Infrastructure to generalize XDP

2017-02-09 Thread David Miller

From: Tom Herbert 
Date: Wed, 8 Feb 2017 15:41:20 -0800

> These hooks are also generic to allow for XDP/BPF programs as well
> as non-BPF code (e.g. kernel code can be written in a module).

I don't think we should even remotely consider surrendering the XDP
hook to module code.

We restrict it to eBPF for a reason, because that framework is
restricted in what it can do, what it can access, and how it can do
so.

Tom if you're going to do a cleanup that makes it so that drivers
need less code to support XDP, that is awesome but please do only
that.

Don't combine it with more controversial changes.

Thank you.

Re: [PATCH] [net-next] ARM: orion: remove unused wnr854t_switch_plat_data

2017-02-09 Thread David Miller

From: Arnd Bergmann 
Date: Wed,  8 Feb 2017 22:24:19 +0100

> The other instances of this structure got removed along with the MDIO
> device change, but this one was left behind and needs to be removed
> as well:
> 
> arch/arm/mach-orion5x/wnr854t-setup.c:109:44: error: 
> 'wnr854t_switch_plat_data' defined but not used [-Werror=unused-variable]
>  static struct dsa_platform_data __initdata wnr854t_switch_plat_data = {
> 
> Fixes: 575e93f7b5e6 ("ARM: orion: Register DSA switch as a MDIO device")
> Signed-off-by: Arnd Bergmann 

Applied.

Re: [PATCH net v3 1/2] net: ethernet: bgmac: init sequence bug

2017-02-09 Thread David Miller

From: Jon Mason 
Date: Wed,  8 Feb 2017 16:12:56 -0500

> The code now checks to see if the adapter needs to be brought out of
> reset (where as before it was doing an IDM write to bring it out of
> reset regardless of whether it was in reset or not).  Also, removed
> unnecessary usleeps (as there is already a read present to flush the
> IDM writes).

That's not what the delays are there for, I don't think you can
safely remove them.

It's for waiting for the reset to complete after the write to
the register has propagated by the read back.

Please retain the delays.

Thanks.

Re: [PATCH net] net: phy: Initialize mdio clock at probe function

2017-02-09 Thread David Miller

From: Jon Mason 
Date: Wed,  8 Feb 2017 17:14:26 -0500

> From: Yendapally Reddy Dhananjaya Reddy 
> 
> USB PHYs need the MDIO clock divisor enabled earlier to work.
> Initialize mdio clock divisor in probe function. The ext bus
> bit available in the same register will be used by mdio mux
> to enable external mdio.
> 
> Signed-off-by: Yendapally Reddy Dhananjaya Reddy 
> 
> Fixes: ddc24ae1 ("net: phy: Broadcom iProc MDIO bus driver")
> Reviewed-by: Florian Fainelli 
> Signed-off-by: Jon Mason 

Applied.

Re: [PATCH net-next] net: dsa: Fix duplicate object rule

2017-02-09 Thread David Miller

From: Florian Fainelli 
Date: Wed,  8 Feb 2017 14:40:04 -0800

> While adding switch.o to the list of DSA object files, we essentially
> duplicated the previous obj-y line and just added switch.o, remove the
> duplicate.
> 
> Fixes: f515f192ab4f ("net: dsa: add switch notifier")
> Signed-off-by: Florian Fainelli 

Applied.

Re: [PATCH 0/2] net: qcom/emac: add the last ethtool functions

2017-02-09 Thread David Miller

From: Timur Tabi 
Date: Wed,  8 Feb 2017 15:49:26 -0600

> These two patches implement the remaining two ethtool functions that
> are of interest to the Qualcomm EMAC driver.  These are the last 
> patches that will be submitted for the 4.11 merge window.

Series applied to net-next, thanks.

Re: [PATCHv5 0/7] Refactor macvtap to re-use tap functionality by other virtual intefaces

2017-02-09 Thread David Miller

From: Sainath Grandhi 
Date: Wed,  8 Feb 2017 13:37:09 -0800

> Tap character devices can be implemented on other virtual interfaces like
> ipvlan, similar to macvtap. Source code for tap functionality in macvtap
> can be re-used for this purpose.
> 
> This patch series splits macvtap source into two modules, macvtap and tap.
> This patch series also includes a patch for implementing tap character
> device driver based on the IP-VLAN network interface, called ipvtap.
> 
> These patches are tested on x86 platform.

I get rejects on patch #7 when I try to apply this to net-next,
please respin.

Re: [PATCHv4 net-next 0/7] net: dst_confirm replacement

2017-02-09 Thread David Miller

From: Julian Anastasov 
Date: Wed, 8 Feb 2017 23:05:36 +0200 (EET)

> On Tue, 7 Feb 2017, David Miller wrote:
> 
>> > - Now may be old function neigh_output() should be restored
>> > instead of dst_neigh_output?
>> 
>> Please elaborate.
> 
>   As dst_neigh_output does not use the dst arg anymore,
> my idea was to rename dst_neigh_output to neigh_output, i.e.
> just like it was before commit 5110effee8fd ("net: Do delayed neigh 
> confirmation."

This sounds like a great idea.

Re: [PATCHv6 net-next 0/6] sctp: add sender-side procedures for stream reconf asoc reset and add streams

2017-02-09 Thread David Miller

From: Xin Long 
Date: Thu,  9 Feb 2017 01:18:14 +0800

> Patch 4/6 is to implement sender-side procedures for the SSN/TSN Reset
> Request Parameter described in rfc6525 section 5.1.4, patch 3/6 is
> ahead of it to define a function to make the request chunk for it.
> 
> Patch 6/6 is to implement sender-side procedures for the Add Incoming
> and Outgoing Streams Request Parameter Request Parameter described in
> rfc6525 section 5.1.5 and 5.1.6, patch 5/6 is ahead of it to define a
> function to make the request chunk for it.
> 
> Patch 2/6 is a fix to recover streams states when it fails to send
> request and Patch 1/6 is to drop some unncessary __packed from some
> old structures.

Series applied with stray __packed reference removed from the commit
message of patch #3.

Re: [PATCH net] igmp, mld: Fix memory leak in igmpv3/mld_del_delrec()

2017-02-09 Thread David Miller

From: Hangbin Liu 
Date: Wed,  8 Feb 2017 21:16:45 +0800

> In function igmpv3/mld_add_delrec() we allocate pmc and put it in
> idev->mc_tomb, so we should free it when we don't need it in del_delrec().
> But I removed kfree(pmc) incorrectly in latest two patches. Now fix it.
> 
> Fixes: 24803f38a5c0 ("igmp: do not remove igmp souce list info when ...")
> Fixes: 1666d49e1d41 ("mld: do not remove mld souce list info when ...")
> Reported-by: Daniel Borkmann 
> Signed-off-by: Hangbin Liu 

Applied and queued up for -stable, thanks.

Re: [PATCH net-next 0/6] sfc: more encap offloads

2017-02-09 Thread David Miller

From: Edward Cree 
Date: Wed, 8 Feb 2017 16:49:12 +

> This patch series adds support for RX checksum offload of encapsulated 
> packets.
> It also adds support for configuring the hardware's lists of UDP ports used 
> for
> VXLAN and GENEVE encapsulation offloads.  Since changing these lists causes 
> the
> MC to reboot, the driver has been hardened against reboots, which used to be
> considered an exceptional occurrence but are now normal.

Series applied, thanks.

[PATCH] net: microchip: encx24j600: use new api ethtool_{get|set}_link_ksettings

2017-02-09 Thread Philippe Reynes

The ethtool api {get|set}_settings is deprecated.
We move this driver to new api {get|set}_link_ksettings.

As I don't have the hardware, I'd be very pleased if
someone may test this patch.

Signed-off-by: Philippe Reynes 
---
 drivers/net/ethernet/microchip/encx24j600.c |   32 +++---
 1 files changed, 18 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/microchip/encx24j600.c 
b/drivers/net/ethernet/microchip/encx24j600.c
index fbce616..f831238 100644
--- a/drivers/net/ethernet/microchip/encx24j600.c
+++ b/drivers/net/ethernet/microchip/encx24j600.c
@@ -940,29 +940,33 @@ static void encx24j600_get_drvinfo(struct net_device *dev,
sizeof(info->bus_info));
 }
 
-static int encx24j600_get_settings(struct net_device *dev,
-  struct ethtool_cmd *cmd)
+static int encx24j600_get_link_ksettings(struct net_device *dev,
+struct ethtool_link_ksettings *cmd)
 {
struct encx24j600_priv *priv = netdev_priv(dev);
+   u32 supported;
 
-   cmd->transceiver = XCVR_INTERNAL;
-   cmd->supported = SUPPORTED_10baseT_Half | SUPPORTED_10baseT_Full |
+   supported = SUPPORTED_10baseT_Half | SUPPORTED_10baseT_Full |
 SUPPORTED_100baseT_Half | SUPPORTED_100baseT_Full |
 SUPPORTED_Autoneg | SUPPORTED_TP;
 
-   ethtool_cmd_speed_set(cmd, priv->speed);
-   cmd->duplex = priv->full_duplex ? DUPLEX_FULL : DUPLEX_HALF;
-   cmd->port = PORT_TP;
-   cmd->autoneg = priv->autoneg ? AUTONEG_ENABLE : AUTONEG_DISABLE;
+   ethtool_convert_legacy_u32_to_link_mode(cmd->link_modes.supported,
+   supported);
+
+   cmd->base.speed = priv->speed;
+   cmd->base.duplex = priv->full_duplex ? DUPLEX_FULL : DUPLEX_HALF;
+   cmd->base.port = PORT_TP;
+   cmd->base.autoneg = priv->autoneg ? AUTONEG_ENABLE : AUTONEG_DISABLE;
 
return 0;
 }
 
-static int encx24j600_set_settings(struct net_device *dev,
-  struct ethtool_cmd *cmd)
+static int
+encx24j600_set_link_ksettings(struct net_device *dev,
+ const struct ethtool_link_ksettings *cmd)
 {
-   return encx24j600_setlink(dev, cmd->autoneg,
- ethtool_cmd_speed(cmd), cmd->duplex);
+   return encx24j600_setlink(dev, cmd->base.autoneg,
+ cmd->base.speed, cmd->base.duplex);
 }
 
 static u32 encx24j600_get_msglevel(struct net_device *dev)
@@ -980,13 +984,13 @@ static void encx24j600_set_msglevel(struct net_device 
*dev, u32 val)
 }
 
 static const struct ethtool_ops encx24j600_ethtool_ops = {
-   .get_settings = encx24j600_get_settings,
-   .set_settings = encx24j600_set_settings,
.get_drvinfo = encx24j600_get_drvinfo,
.get_msglevel = encx24j600_get_msglevel,
.set_msglevel = encx24j600_set_msglevel,
.get_regs_len = encx24j600_get_regs_len,
.get_regs = encx24j600_get_regs,
+   .get_link_ksettings = encx24j600_get_link_ksettings,
+   .set_link_ksettings = encx24j600_set_link_ksettings,
 };
 
 static const struct net_device_ops encx24j600_netdev_ops = {
-- 
1.7.4.4

Re: [PATCH v3] xen-netfront: Improve error handling during initialization

2017-02-09 Thread David Miller

From: Ross Lagerwall 
Date: Wed, 8 Feb 2017 10:57:37 +

> This fixes a crash when running out of grant refs when creating many
> queues across many netdevs.
> 
> * If creating queues fails (i.e. there are no grant refs available),
> call xenbus_dev_fatal() to ensure that the xenbus device is set to the
> closed state.
> * If no queues are created, don't call xennet_disconnect_backend as
> netdev->real_num_tx_queues will not have been set correctly.
> * If setup_netfront() fails, ensure that all the queues created are
> cleaned up, not just those that have been set up.
> * If any queues were set up and an error occurs, call
> xennet_destroy_queues() to clean up the napi context.
> * If any fatal error occurs, unregister and destroy the netdev to avoid
> leaving around a half setup network device.
> 
> Signed-off-by: Ross Lagerwall 

Applied.

Re: [PATCH net] net: phy: Fix PHY module checks and NULL deref in phy_attach_direct()

2017-02-09 Thread David Miller

From: Florian Fainelli 
Date: Wed,  8 Feb 2017 19:05:26 -0800

> The Generic PHY drivers gets assigned after we checked that the current
> PHY driver is NULL, so we need to check a few things before we can
> safely dereference d->driver. This would be causing a NULL deference to
> occur when a system binds to the Generic PHY driver. Update
> phy_attach_direct() to do the following:
> 
> - grab the driver module reference after we have assigned the Generic
>   PHY drivers accordingly, and remember we came from the generic PHY
>   path
> 
> - update the error path to clean up the module reference in case the
>   Generic PHY probe function fails
> 
> - split the error path involving phy_detacht() to avoid double free/put
>   since phy_detach() does all the clean up
> 
> - finally, have phy_detach() drop the module reference count before we
>   call device_release_driver() for the Generic PHY driver case
> 
> Fixes: cafe8df8b9bc ("net: phy: Fix lack of reference count on PHY driver")
> Signed-off-by: Florian Fainelli 

Applied to 'net', thanks Florian.

Re: [patch net-next 5/6] sched: add missing curly braces in else branch in tc_ctl_tfilter

2017-02-09 Thread Jiri Pirko

Thu, Feb 09, 2017 at 07:27:10PM CET, j...@mojatatu.com wrote:
>On 17-02-09 08:38 AM, Jiri Pirko wrote:
>> From: Jiri Pirko 
>> 
>> Curly braces need to be there, for stylistic reasons.
>> 
>> Signed-off-by: Jiri Pirko 
>> ---
>>  net/sched/cls_api.c | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>> 
>> diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
>> index f44378c..48864ad 100644
>> --- a/net/sched/cls_api.c
>> +++ b/net/sched/cls_api.c
>> @@ -315,8 +315,9 @@ static int tc_ctl_tfilter(struct sk_buff *skb, struct 
>> nlmsghdr *n)
>>  err = -EINVAL;
>>  goto errout;
>>  }
>> -} else
>> +} else {
>>  tp = NULL;
>> +}
>>  break;
>>  }
>>  }
>> 
>
>Jiri, shall we engage in a long discussion about which rule says that
>you can put braces around one line branching? ;->

scripts/checkpatch.pl :)


>
>Acked-by: Jamal Hadi Salim 
>
>cheers,
>jamal

Re: [PATCH v4 net-next 09/10] openvswitch: Add force commit.

2017-02-09 Thread Joe Stringer

On 9 February 2017 at 11:22, Jarno Rajahalme  wrote:
> Stateful network admission policy may allow connections to one
> direction and reject connections initiated in the other direction.
> After policy change it is possible that for a new connection an
> overlapping conntrack entry already exists, where the original
> direction of the existing connection is opposed to the new
> connection's initial packet.
>
> Most importantly, conntrack state relating to the current packet gets
> the "reply" designation based on whether the original direction tuple
> or the reply direction tuple matched.  If this "directionality" is
> wrong w.r.t. to the stateful network admission policy it may happen
> that packets in neither direction are correctly admitted.
>
> This patch adds a new "force commit" option to the OVS conntrack
> action that checks the original direction of an existing conntrack
> entry.  If that direction is opposed to the current packet, the
> existing conntrack entry is deleted and a new one is subsequently
> created in the correct direction.
>
> Signed-off-by: Jarno Rajahalme 
> Acked-by: Pravin B Shelar 

Acked-by: Joe Stringer

Re: [PATCH v4 net-next 05/10] openvswitch: Simplify labels length logic.

2017-02-09 Thread Joe Stringer

On 9 February 2017 at 11:21, Jarno Rajahalme  wrote:
> Since 23014011ba42 ("netfilter: conntrack: support a fixed size of 128
> distinct labels"), the size of conntrack labels extension has fixed to
> 128 bits, so we do not need to check for labels sizes shorter than 128
> at run-time.  This patch simplifies labels length logic accordingly,
> but allows the conntrack labels size to be increased in the future
> without breaking the build.  In the event of conntrack labels
> increasing in size OVS would still be able to deal with the 128 first
> label bits.
>
> Suggested-by: Joe Stringer 
> Signed-off-by: Jarno Rajahalme 
> Acked-by: Pravin B Shelar 

Acked-by: Joe Stringer

Re: [PATCH v4 net-next 01/10] openvswitch: Fix comments for skb->_nfct

2017-02-09 Thread Joe Stringer

On 9 February 2017 at 11:21, Jarno Rajahalme  wrote:
> Fix comments referring to skb 'nfct' and 'nfctinfo' fields now that
> they are combined into '_nfct'.
>
> Signed-off-by: Jarno Rajahalme 
> Acked-by: Pravin B Shelar 

Acked-by: Joe Stringer

Re: [PATCH v4 net-next 07/10] openvswitch: Inherit master's labels.

2017-02-09 Thread Joe Stringer

On 9 February 2017 at 11:21, Jarno Rajahalme  wrote:
> We avoid calling into nf_conntrack_in() for expected connections, as
> that would remove the expectation that we want to stick around until
> we are ready to commit the connection.  Instead, we do a lookup in the
> expectation table directly.  However, after a successful expectation
> lookup we have set the flow key label field from the master
> connection, whereas nf_conntrack_in() does not do this.  This leads to
> master's labels being inherited after an expectation lookup, but those
> labels not being inherited after the corresponding conntrack action
> with a commit flag.
>
> This patch resolves the problem by changing the commit code path to
> also inherit the master's labels to the expected connection.
> Resolving this conflict in favor of inheriting the labels allows more
> information be passed from the master connection to related
> connections, which would otherwise be much harder if the 32 bits in
> the connmark are not enough.  Labels can still be set explicitly, so
> this change only affects the default values of the labels in presense
> of a master connection.
>
> Fixes: 7f8a436eaa2c ("openvswitch: Add conntrack action")
> Signed-off-by: Jarno Rajahalme 
> Acked-by: Pravin B Shelar 

Acked-by: Joe Stringer

Re: [PATCH v3 0/2] Fixes for sierra_net driver

2017-02-09 Thread David Miller

From: Stefan Brüns 
Date: Wed, 8 Feb 2017 02:46:31 +0100

> When trying to initiate a dual-stack (ipv4v6) connection, a MC7710, FW
> version SWI9200X_03.05.24.00ap answers with an unsupported LSI. Add support
> for this LSI.
> Also the link_type should be ignored when going idle, otherwise the modem
> is stuck in a bad link state.
> Tested on MC7710, T-Mobile DE, APN internet.telekom, IPv4v6 PDP type. Both
> IPv4 and IPv6 connections work.
> 
> v2: Do not overwrite protocol field in rx_fixup
> v3: Remove leftover struct ethhdr *eth declaration

Series applied, thanks.

Re: [Patch net] kcm: fix 0-length case for kcm_sendmsg()

2017-02-09 Thread David Miller

From: Cong Wang 
Date: Tue,  7 Feb 2017 12:59:47 -0800

> Dmitry reported a kernel warning:
> 
>  WARNING: CPU: 3 PID: 2936 at net/kcm/kcmsock.c:627
>  kcm_write_msgs+0x12e3/0x1b90 net/kcm/kcmsock.c:627
>  CPU: 3 PID: 2936 Comm: a.out Not tainted 4.10.0-rc6+ #209
>  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
>  Call Trace:
>   __dump_stack lib/dump_stack.c:15 [inline]
>   dump_stack+0x2ee/0x3ef lib/dump_stack.c:51
>   panic+0x1fb/0x412 kernel/panic.c:179
>   __warn+0x1c4/0x1e0 kernel/panic.c:539
>   warn_slowpath_null+0x2c/0x40 kernel/panic.c:582
>   kcm_write_msgs+0x12e3/0x1b90 net/kcm/kcmsock.c:627
>   kcm_sendmsg+0x163a/0x2200 net/kcm/kcmsock.c:1029
>   sock_sendmsg_nosec net/socket.c:635 [inline]
>   sock_sendmsg+0xca/0x110 net/socket.c:645
>   sock_write_iter+0x326/0x600 net/socket.c:848
>   new_sync_write fs/read_write.c:499 [inline]
>   __vfs_write+0x483/0x740 fs/read_write.c:512
>   vfs_write+0x187/0x530 fs/read_write.c:560
>   SYSC_write fs/read_write.c:607 [inline]
>   SyS_write+0xfb/0x230 fs/read_write.c:599
>   entry_SYSCALL_64_fastpath+0x1f/0xc2
> 
> when calling syscall(__NR_write, sock2, 0x208aaf27ul, 0x0ul) on a KCM
> seqpacket socket. It appears that kcm_sendmsg() does not handle len==0
> case correctly, which causes an empty skb is allocated and queued.
> Fix this by skipping the skb allocation for len==0 case.
> 
> Reported-by: Dmitry Vyukov 
> Cc: Tom Herbert 
> Signed-off-by: Cong Wang 

Applied and queued up for -stable.

Re: [PATCH] xen-netfront: Rework the fix for Rx stall during OOM and network stress

2017-02-09 Thread David Miller

From: Vineeth Remanan Pillai 
Date: Tue, 7 Feb 2017 18:59:01 +

> The commit 90c311b0eeea ("xen-netfront: Fix Rx stall during network
> stress and OOM") caused the refill timer to be triggerred almost on
> all invocations of xennet_alloc_rx_buffers for certain workloads.
> This reworks the fix by reverting to the old behaviour and taking into
> consideration the skb allocation failure. Refill timer is now triggered
> on insufficient requests or skb allocation failure.
> 
> Signed-off-by: Vineeth Remanan Pillai 
> Fixes: 90c311b0eeea (xen-netfront: Fix Rx stall during network stress and OOM)
> Reported-by: Boris Ostrovsky 

Applied.

Re: [PATCH] netlink: move nla_put_{u8,u16,u32} out of line

2017-02-09 Thread David Miller

From: Arnd Bergmann 
Date: Wed,  8 Feb 2017 22:18:26 +0100

> When CONFIG_KASAN is enabled, the "--param asan-stack=1" causes rather large
> stack frames in some functions. This goes unnoticed normally because
> CONFIG_FRAME_WARN is disabled with CONFIG_KASAN by default as of commit
> 3f181b4d8652 ("lib/Kconfig.debug: disable -Wframe-larger-than warnings with
> KASAN=y").
> 
> The kernelci.org build bot however has the warning enabled and that led
> me to investigate it a little further, as every build produces these warnings:
> 
> net/wireless/nl80211.c:4389:1: warning: the frame size of 2240 bytes is 
> larger than 2048 bytes [-Wframe-larger-than=]
> net/wireless/nl80211.c:1895:1: warning: the frame size of 3776 bytes is 
> larger than 2048 bytes [-Wframe-larger-than=]
> net/wireless/nl80211.c:1410:1: warning: the frame size of 2208 bytes is 
> larger than 2048 bytes [-Wframe-larger-than=]
> net/bridge/br_netlink.c:1282:1: warning: the frame size of 2544 bytes is 
> larger than 2048 bytes [-Wframe-larger-than=]
> 
> It turns out that there is a relatively simple workaround for the netlink
> users that currently use a local variable in order to do the type conversion:
> Moving the three functions (for each of the typical sizes) to lib/nlattr.c
> avoids using local variables in the caller, which drastically reduces the
> stack usage for nl80211 and br_netlink.
> 
> It would be good if we could enable the frame size check after that again,
> but that should be a separate patch and it requires some more testing
> to see which the largest acceptable frame size should be.
> 
> Cc: Andrey Ryabinin 
> Cc: Alexander Potapenko 
> Cc: Dmitry Vyukov 
> Cc: kasan-...@googlegroups.com
> Signed-off-by: Arnd Bergmann 

You should only extern these things when KASAN is enabled.

The reason is that uninlining these routines makes attribute emission
more expensive and for some applications performance of this matters.

Re: [PATCH net-next v4 04/11] bpf: Use bpf_load_program() from the library

2017-02-09 Thread Daniel Borkmann

On 02/08/2017 09:49 PM, Mickaël Salaün wrote:

Replace bpf_prog_load() with bpf_load_program() calls.

Signed-off-by: Mickaël Salaün 
Cc: Alexei Starovoitov 
Cc: Daniel Borkmann 
Cc: Shuah Khan 

[...]

diff --git a/tools/testing/selftests/bpf/test_tag.c 
b/tools/testing/selftests/bpf/test_tag.c
index 5f7c602f47d1..b77dc4b03e77 100644
--- a/tools/testing/selftests/bpf/test_tag.c
+++ b/tools/testing/selftests/bpf/test_tag.c
@@ -16,6 +16,8 @@
  #include 
  #include 

+#include 
+
  #include "../../../include/linux/filter.h"

  #include "bpf_sys.h"
@@ -55,8 +57,8 @@ static int bpf_try_load_prog(int insns, int fd_map,
int fd_prog;

bpf_filler(insns, fd_map);
-   fd_prog = bpf_prog_load(BPF_PROG_TYPE_SCHED_CLS, prog, insns *
-   sizeof(struct bpf_insn), "", NULL, 0);
+   fd_prog = bpf_load_program(BPF_PROG_TYPE_SCHED_CLS, prog, insns, "", 0,
+   NULL, 0);

Went over the set and generally looks good. Please make sure though,
like in above case that you properly fix aligning next line to the
opening '('. I've noticed this multiple times in this and in the next
patch at least. Please double check the rest of your series as well.

Thanks,
Daniel

Re: Extending socket timestamping API for NTP

2017-02-09 Thread Denny Page


> On Feb 08, 2017, at 16:45, Denny Page  wrote:
> 
> [Resend as plain text]
> 
> 
>> On Feb 07, 2017, at 06:01, Miroslav Lichvar  wrote:
>> 
>> 5) new SO_TIMESTAMPING options to get transposed RX timestamps
>> 
>>  PTP uses preamble RX timestamps, but NTP works with trailer RX
>>  timestamps. This means NTP implementations currently need to
>>  transpose HW RX timestamps. The calculation requires the link speed
>>  and the length of the packet at layer 2. It seems this can be
>>  reliably done only using raw sockets. It would be very nice if the
>>  kernel could tranpose the timestamps automatically.
>> 
>>  The existing SOF_TIMESTAMPING_RX_HARDWARE flag could be aliased to
>>  SOF_TIMESTAMPING_RX_HARDWARE_PREAMBLE and the new flag could be
>>  SOF_TIMESTAMPING_RX_HARDWARE_TRAILER.
>> 
>>  PTP has a similar problem with SW RX timestamps, which are closer
>>  to the trailer timestamps rather than preamble timestamps. A new
>>  SOF_TIMESTAMPING_RX_SOFTWARE_PREAMBLE flag could be added for PTP
>>  implementations to get transposed timestamps in order to improve
>>  accuracy.
>> 
>> 6) new SO_TIMESTAMPING option to get PHC index with HW timestamps
>> 
>>  With bridges, bonding and other things it's difficult to determine
>>  which PHC timestamped the packet. It would be very useful if the
>>  PHC index was provided with each HW timestamp.
>> 
>>  I'm not sure what would be the best place to put it. I guess the
>>  second timespec in scm_timestamping could be reused for this, but
>>  that sounds like a gross hack. Do we need to define a new struct?
> 
> 
> Miroslav, if #5 were implemented, would #6 still needed?
> 
> Denny

Miroslav, please ignore this. Of course you still need the index in order to 
get the PHC offset. My bad.

Denny

1 2 3 4 >

1 - 100 of 316 matches

Mail list logo