date:20170414

Re: [PATCH 08/22] crypto: chcr: Make use of the new sg_map helper function

2017-04-14 Thread Harsh Jain

On Fri, Apr 14, 2017 at 3:35 AM, Logan Gunthorpe  wrote:
> The get_page in this area looks *highly* suspect due to there being no
> corresponding put_page. However, I've left that as is to avoid breaking
> things.
chcr driver will post the request to LLD driver cxgb4 and put_page is
implemented there. it will no harm. Any how
we have removed the below code from driver.

http://www.mail-archive.com/linux-crypto@vger.kernel.org/msg24561.html

After this merge we can ignore your patch. Thanks

>
> I've also removed the KMAP_ATOMIC_ARGS check as it appears to be dead
> code that dates back to when it was first committed...


>
> Signed-off-by: Logan Gunthorpe 
> ---
>  drivers/crypto/chelsio/chcr_algo.c | 28 +++-
>  1 file changed, 15 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/crypto/chelsio/chcr_algo.c 
> b/drivers/crypto/chelsio/chcr_algo.c
> index 41bc7f4..a993d1d 100644
> --- a/drivers/crypto/chelsio/chcr_algo.c
> +++ b/drivers/crypto/chelsio/chcr_algo.c
> @@ -1489,22 +1489,21 @@ static struct sk_buff *create_authenc_wr(struct 
> aead_request *req,
> return ERR_PTR(-EINVAL);
>  }
>
> -static void aes_gcm_empty_pld_pad(struct scatterlist *sg,
> - unsigned short offset)
> +static int aes_gcm_empty_pld_pad(struct scatterlist *sg,
> +unsigned short offset)
>  {
> -   struct page *spage;
> unsigned char *addr;
>
> -   spage = sg_page(sg);
> -   get_page(spage); /* so that it is not freed by NIC */
> -#ifdef KMAP_ATOMIC_ARGS
> -   addr = kmap_atomic(spage, KM_SOFTIRQ0);
> -#else
> -   addr = kmap_atomic(spage);
> -#endif
> -   memset(addr + sg->offset, 0, offset + 1);
> +   get_page(sg_page(sg)); /* so that it is not freed by NIC */
> +
> +   addr = sg_map(sg, SG_KMAP_ATOMIC);
> +   if (IS_ERR(addr))
> +   return PTR_ERR(addr);
> +
> +   memset(addr, 0, offset + 1);
> +   sg_unmap(sg, addr, SG_KMAP_ATOMIC);
>
> -   kunmap_atomic(addr);
> +   return 0;
>  }
>
>  static int set_msg_len(u8 *block, unsigned int msglen, int csize)
> @@ -1940,7 +1939,10 @@ static struct sk_buff *create_gcm_wr(struct 
> aead_request *req,
> if (req->cryptlen) {
> write_sg_to_skb(skb, &frags, src, req->cryptlen);
> } else {
> -   aes_gcm_empty_pld_pad(req->dst, authsize - 1);
> +   err = aes_gcm_empty_pld_pad(req->dst, authsize - 1);
> +   if (err)
> +   goto dstmap_fail;
> +
> write_sg_to_skb(skb, &frags, reqctx->dst, crypt_len);
>
> }
> --
> 2.1.4
>

Re: [PATCH net-next 1/8] rtnetlink: Do not generate notifications for MTU events

2017-04-14 Thread David Ahern

+Nicolas

On 4/14/17 7:51 PM, Vlad Yasevich wrote:
> On 04/10/2017 11:49 AM, David Ahern wrote:
>> On 4/10/17 9:39 AM, Vlad Yasevich wrote:
>>> OK, so this will work for the events that are generated as a result of 
>>> device state change
>>> (like mtu, address, and others).
>>>
>>> However, the original event data may be needed for other events that may be
>>> of use to userspace like NETDEV_NOTIFY_PEERS and NETDEV_RESEND_IGMP 
>>> (possibly others...)
>>
>> sure. My objection is to multiple messages with identical content.
>>
>> I think the rtnetlink_event message is unique for those 2 netdev events,
>> so no objections if it has value.
>>
> 
> So, I've been looking at adding a bitmap and collecting all modification, 
> however
> I ran into an interesting issue in do_setlink.
> 
> Currently the notifications from do_setlink() don't appear to work as one 
> would
> expect and it's somewhat confusing upon deeper inspection.
> 
> We have 2 values DO_SETLINK_MODIFIED (1) and DO_SETLINK_NOTIFY (3).  These 2 
> 'attempt'
> to do different jobs, but really fail at it.  The function will generate 
> notifications
> regardless of which of the above values is used.
> 
> Those changes were done in commit ba9989069f4e426b1e0ed7018eacc9e1ba607095 
> (cc Nicolas
> just in case he remembers the history)
> 
> I am not sure which changes should really trigger a call 
> netdev_state_change(), thus this
> message.  Right now, all changes done in this function trigger them.  If 
> that's how that
> should function, that I can simplify the code.  If not, then some of the 
> changes may
> require us to export the event to the user.
> 
> To use the dreaded NETDEV_CHANGEMTU event as an example, we used to generate 
> 3 messages
> (PRECHANGEMTU, CHANGEMTU, and a message from netdev_state_change).  With 
> recent changes,
> we now only generate a message from netdev_state_change.  However, mtu change 
> is tagged
> with DO_SETLINK_MODIFIED which doesn't include the notify bit.  So, should 
> there be a
> NETDEV_CHANGE event associated with this change and a rtnl message (as it is 
> now) or not?
> It's unclear.
> 
> -vlad
>

Re: [PATCH net-next 1/8] rtnetlink: Do not generate notifications for MTU events

2017-04-14 Thread Vlad Yasevich

On 04/10/2017 11:49 AM, David Ahern wrote:
> On 4/10/17 9:39 AM, Vlad Yasevich wrote:
>> OK, so this will work for the events that are generated as a result of 
>> device state change
>> (like mtu, address, and others).
>>
>> However, the original event data may be needed for other events that may be
>> of use to userspace like NETDEV_NOTIFY_PEERS and NETDEV_RESEND_IGMP 
>> (possibly others...)
> 
> sure. My objection is to multiple messages with identical content.
> 
> I think the rtnetlink_event message is unique for those 2 netdev events,
> so no objections if it has value.
> 

So, I've been looking at adding a bitmap and collecting all modification, 
however
I ran into an interesting issue in do_setlink.

Currently the notifications from do_setlink() don't appear to work as one would
expect and it's somewhat confusing upon deeper inspection.

We have 2 values DO_SETLINK_MODIFIED (1) and DO_SETLINK_NOTIFY (3).  These 2 
'attempt'
to do different jobs, but really fail at it.  The function will generate 
notifications
regardless of which of the above values is used.

Those changes were done in commit ba9989069f4e426b1e0ed7018eacc9e1ba607095 (cc 
Nicolas
just in case he remembers the history)

I am not sure which changes should really trigger a call netdev_state_change(), 
thus this
message.  Right now, all changes done in this function trigger them.  If that's 
how that
should function, that I can simplify the code.  If not, then some of the 
changes may
require us to export the event to the user.

To use the dreaded NETDEV_CHANGEMTU event as an example, we used to generate 3 
messages
(PRECHANGEMTU, CHANGEMTU, and a message from netdev_state_change).  With recent 
changes,
we now only generate a message from netdev_state_change.  However, mtu change 
is tagged
with DO_SETLINK_MODIFIED which doesn't include the notify bit.  So, should 
there be a
NETDEV_CHANGE event associated with this change and a rtnl message (as it is 
now) or not?
It's unclear.

-vlad

Re: [PATCH v3 net-next RFC] Generic XDP

2017-04-14 Thread Jakub Kicinski

On Fri, 14 Apr 2017 17:46:44 -0700, Alexei Starovoitov wrote:
> On Fri, Apr 14, 2017 at 03:30:32PM -0700, Jakub Kicinski wrote:
> > On Fri, 14 Apr 2017 12:28:15 -0700, Alexei Starovoitov wrote:  
> > > On Fri, Apr 14, 2017 at 11:05:25AM +0200, Jesper Dangaard Brouer wrote:  
> > > > > 
> > > > > We are consistently finding that there is this real need to
> > > > > communicate XDP capabilities, or somehow verify that the needs
> > > > > of an XDP program can be satisfied by a given implementation.
> > > > 
> > > > I fully agree that we need some way to express capabilities[1]
> > > > 
> > > > > Maximum headroom is just one.
> > > 
> > > I don't like the idea of asking program author to explain capabilities
> > > to the kernel. Right now the verifier already understands more about
> > > the program than human does. If the verifier cannot deduct from the
> > > insns what program will be doing, it's pretty much guarantee
> > > that program author has no idea either.
> > > If we add 'required_headroom' as an extra flag to BPF_PROG_LOAD,
> > > the users will just pass something like 64 or 128 whereas the program
> > > might only be doing IPIP encap and that will cause kernel to
> > > provide extra headroom for no good reason or reject the program
> > > whereas it could have run just fine.
> > > 
> > > So I very much agree with this part:  
> > > > ... or somehow verify that the needs
> > > > of an XDP program can be satisfied by a given implementation.
> > > 
> > > we already have cb_access, dst_needed, xdp_adjust_head flags
> > > that verifier discovers in the program.
> > > For headroom we need one more. The verifier can see
> > > the constant passed into bpf_xdp_adjust_head().
> > > It's trickier if it's a variable, but the verifier can estimate min/max
> > > of the variable already and worst case it can say that it
> > > will be XDP_PACKET_HEADROOM.
> > > If the program is doing variable length bpf_xdp_adjust_head(),
> > > the human has no idea how much they need and cannot tell kernel anyway.  
> > 
> > If I understand correctly that the current proposal is to:
> > 
> > 1. Assume program needs 256 (XDP_PACKET_HEADROOM) of prepend.
> > 2. Make verifier try to lower that through byte code analysis.
> > 3. Make the load fail if (prog->max_prepend > driver->xdp_max_prepend).  
> 
> no. the hard fail will be wrong, since verifier is conservative.
> As i showed in the example with variable ip hdr length.
> 
> > We may expect most users won't care and we may want to still depend on
> > the verifier to analyze the program and enable optimizations, but
> > depending on the verifier for proving prepend lengths is scary.  Or are
> > we just discussion the optimizations here and not the guarantees about
> > headroom availability?  
> 
> yes. I was thinking as an optimization. If verifier can see that
> prog needs only 20 bytes of headroom or no headroom (like netronome
> already does) than the ring can be configured differently for hopefully
> better performance.
> But that's probably bad idea, since I was assuming that such ring
> reconfiguration can be made fast, which is unlikely.
> If it takes seconds to setup a ring, then drivers should just
> assume XDP_PACKET_HEADROOM, since at that time the program
> properties are unknown and in the future other programs will be loaded.
> Take a look at our current setup with slots for xdp_dump, ddos, lb
> programs. Only root program is attached at the begining.
> If the driver configures the ring for such empty program that would break
> dynamic addition of lb prog.
> The driver must not interrupt the traffic when user adds another
> prog to prog array. In such case XDP side doesn't even know that
> prog array is used. It's all happening purely on bpf side.
>
> > Or worse not do that, use Qlogic, Broadcom, Mellanox or Netronome cards
> > and then be in for a nasty surprise when they try to load the same
> > program for Intel parts :(  
> 
> are you talking hw offloaded prog or normal in-driver xdp?

In-driver, mostly.  As you said earlier in the thread we expect
Alex's/Intel's patches to provide "only" 192B of headroom.  For the NFP
offload there are trade-offs to full 256B offset.  I think we could just
fallback to the host, but if user hints that the headroom is big
perhaps we would be better off not even trying to offload.

> > I agree that trying to express capabilities to user space is quickly
> > going to become more confusing than helpful.  However, I think here we
> > need the opposite - we need an opt-in way of program requesting
> > guarantees about the datapath.  
> 
> what is an example of what you proposing? How will it look from user pov?

This is roughly what I had in mind.  From user's perspective it would
probably make sense to tie it to the program itself, so:

bpf_helpers.h:

#define __XDP_ATTR  \
__attribute__((section(ELF_SECTION_XDP_ATTRS), used))

#define XDP_GUARANTEE_HEADROOM(val) \
__u32 xdp_gua

Re: [PATCH net-next v2] Add uid and cookie bpf helper to cg_skb_func_proto

2017-04-14 Thread Alexei Starovoitov

On Fri, Apr 14, 2017 at 06:25:26PM -0700, Chenbo Feng wrote:
> From: Chenbo Feng 
> 
> BPF helper functions get_socket_cookie and get_socket_uid can be
> used for network traffic classifications, among others. Expose
> them also to programs of type BPF_PROG_TYPE_CGROUP_SKB. As of
> commit 8f917bba0042 ("bpf: pass sk to helper functions") the
> required skb->sk function is available at both cgroup bpf ingress
> and egress hooks. With these two new helper, cg_skb_func_proto is
> effectively the same as sk_filter_func_proto.
> 
> Change since V1:
> Instead of add the helper to cg_skb_func_proto, redirect the
> cg_skb_func_proto to sk_filter_func_proto since all helper function
> in sk_filter_func_proto are applicable to cg_skb_func_proto now.
> 
> Signed-off-by: Chenbo Feng 

Thanks!
Acked-by: Alexei Starovoitov

[PATCH net-next v2] Add uid and cookie bpf helper to cg_skb_func_proto

2017-04-14 Thread Chenbo Feng

From: Chenbo Feng 

BPF helper functions get_socket_cookie and get_socket_uid can be
used for network traffic classifications, among others. Expose
them also to programs of type BPF_PROG_TYPE_CGROUP_SKB. As of
commit 8f917bba0042 ("bpf: pass sk to helper functions") the
required skb->sk function is available at both cgroup bpf ingress
and egress hooks. With these two new helper, cg_skb_func_proto is
effectively the same as sk_filter_func_proto.

Change since V1:
Instead of add the helper to cg_skb_func_proto, redirect the
cg_skb_func_proto to sk_filter_func_proto since all helper function
in sk_filter_func_proto are applicable to cg_skb_func_proto now.

Signed-off-by: Chenbo Feng 
---
 net/core/filter.c | 7 +--
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/net/core/filter.c b/net/core/filter.c
index ce2a19d..19be954 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -2766,12 +2766,7 @@ xdp_func_proto(enum bpf_func_id func_id)
 static const struct bpf_func_proto *
 cg_skb_func_proto(enum bpf_func_id func_id)
 {
-   switch (func_id) {
-   case BPF_FUNC_skb_load_bytes:
-   return &bpf_skb_load_bytes_proto;
-   default:
-   return bpf_base_func_proto(func_id);
-   }
+   return sk_filter_func_proto(func_id);
 }
 
 static const struct bpf_func_proto *
-- 
2.7.4

Re: [PATCH v4 net-next RFC] net: Generic XDP

2017-04-14 Thread Alexei Starovoitov

On Thu, Apr 13, 2017 at 04:23:15PM -0400, David Miller wrote:
> +
> + switch (act) {
> + case XDP_TX:
> + __skb_push(skb, skb->mac_len);

s/skb->mac_len/mac_len/

> + HARD_TX_UNLOCK(dev, txq);
> + if (free_skb) {
> + trace_xdp_exception(dev, xdp_prog, XDP_TX);
> + kfree_skb(skb);

nice that you didn't forget to add trace_xdp_exception in this path :)

Overall looks good to me and other than the minor nit in tx, i think it
should work for programs already used with in-driver xdp.
I'll test it next week unless people beat me to it.

Re: [PATCH v3 net-next RFC] Generic XDP

2017-04-14 Thread Alexei Starovoitov

On Fri, Apr 14, 2017 at 03:30:32PM -0700, Jakub Kicinski wrote:
> On Fri, 14 Apr 2017 12:28:15 -0700, Alexei Starovoitov wrote:
> > On Fri, Apr 14, 2017 at 11:05:25AM +0200, Jesper Dangaard Brouer wrote:
> > > > 
> > > > We are consistently finding that there is this real need to
> > > > communicate XDP capabilities, or somehow verify that the needs
> > > > of an XDP program can be satisfied by a given implementation.  
> > > 
> > > I fully agree that we need some way to express capabilities[1]
> > >   
> > > > Maximum headroom is just one.  
> > 
> > I don't like the idea of asking program author to explain capabilities
> > to the kernel. Right now the verifier already understands more about
> > the program than human does. If the verifier cannot deduct from the
> > insns what program will be doing, it's pretty much guarantee
> > that program author has no idea either.
> > If we add 'required_headroom' as an extra flag to BPF_PROG_LOAD,
> > the users will just pass something like 64 or 128 whereas the program
> > might only be doing IPIP encap and that will cause kernel to
> > provide extra headroom for no good reason or reject the program
> > whereas it could have run just fine.
> > 
> > So I very much agree with this part:
> > > ... or somehow verify that the needs
> > > of an XDP program can be satisfied by a given implementation.  
> > 
> > we already have cb_access, dst_needed, xdp_adjust_head flags
> > that verifier discovers in the program.
> > For headroom we need one more. The verifier can see
> > the constant passed into bpf_xdp_adjust_head().
> > It's trickier if it's a variable, but the verifier can estimate min/max
> > of the variable already and worst case it can say that it
> > will be XDP_PACKET_HEADROOM.
> > If the program is doing variable length bpf_xdp_adjust_head(),
> > the human has no idea how much they need and cannot tell kernel anyway.
> 
> If I understand correctly that the current proposal is to:
> 
> 1. Assume program needs 256 (XDP_PACKET_HEADROOM) of prepend.
> 2. Make verifier try to lower that through byte code analysis.
> 3. Make the load fail if (prog->max_prepend > driver->xdp_max_prepend).

no. the hard fail will be wrong, since verifier is conservative.
As i showed in the example with variable ip hdr length.

> We may expect most users won't care and we may want to still depend on
> the verifier to analyze the program and enable optimizations, but
> depending on the verifier for proving prepend lengths is scary.  Or are
> we just discussion the optimizations here and not the guarantees about
> headroom availability?

yes. I was thinking as an optimization. If verifier can see that
prog needs only 20 bytes of headroom or no headroom (like netronome
already does) than the ring can be configured differently for hopefully
better performance.
But that's probably bad idea, since I was assuming that such ring
reconfiguration can be made fast, which is unlikely.
If it takes seconds to setup a ring, then drivers should just
assume XDP_PACKET_HEADROOM, since at that time the program
properties are unknown and in the future other programs will be loaded.
Take a look at our current setup with slots for xdp_dump, ddos, lb
programs. Only root program is attached at the begining.
If the driver configures the ring for such empty program that would break
dynamic addition of lb prog.
The driver must not interrupt the traffic when user adds another
prog to prog array. In such case XDP side doesn't even know that
prog array is used. It's all happening purely on bpf side.

> Or worse not do that, use Qlogic, Broadcom, Mellanox or Netronome cards
> and then be in for a nasty surprise when they try to load the same
> program for Intel parts :(

are you talking hw offloaded prog or normal in-driver xdp?

> I agree that trying to express capabilities to user space is quickly
> going to become more confusing than helpful.  However, I think here we
> need the opposite - we need an opt-in way of program requesting
> guarantees about the datapath.

what is an example of what you proposing? How will it look from user pov?

Re: ipv6 udp early demux breaks udp_l3mdev_accept=0

2017-04-14 Thread David Ahern

On 4/14/17 6:38 PM, Subash Abhinov Kasiviswanathan wrote:
> On 2017-04-14 16:23, David Ahern wrote:
>> Subash:
>>
>> My understanding of early demux is that it should only match connected
>> sockets (src/sport + dst/dport are all set). The ipv6 udp early demux
>> code added in 5425077d73e0c is doing a generic socket lookup which does
>> not require an exact match and accordingly breaks udp_l3mdev_accept=0.
>> Please fix.
>>
>> David
> 
> Hi David
> 
> Would it be possible to disable udp_early_demux for your case temporarily.
> I'll look into this.
> 

It should be fixed for 4.12. Basically, __udp6_lib_demux_lookup needs to
be more like __udp4_lib_demux_lookup

Re: ipv6 udp early demux breaks udp_l3mdev_accept=0

2017-04-14 Thread Subash Abhinov Kasiviswanathan


On 2017-04-14 16:23, David Ahern wrote:

Subash:

My understanding of early demux is that it should only match connected
sockets (src/sport + dst/dport are all set). The ipv6 udp early demux
code added in 5425077d73e0c is doing a generic socket lookup which does
not require an exact match and accordingly breaks udp_l3mdev_accept=0.
Please fix.

David


Hi David

Would it be possible to disable udp_early_demux for your case 
temporarily.

I'll look into this.

--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,

a Linux Foundation Collaborative Project

Re: [PATCH net-next] Add uid and cookie bpf helper to cg_skb_func_proto

2017-04-14 Thread Daniel Borkmann


On 04/15/2017 02:07 AM, Alexei Starovoitov wrote:

On Fri, Apr 14, 2017 at 04:12:14PM -0700, Chenbo Feng wrote:

From: Chenbo Feng 

BPF helper functions get_socket_cookie and get_socket_uid can be
used for network traffic classifications, among others. Expose
them also to programs of type BPF_PROG_TYPE_CGROUP_SKB. As of
commit 8f917bba0042 ("bpf: pass sk to helper functions") the required
skb->sk function is available at both cgroup bpf ingress and egress
hooks.

Signed-off-by: Chenbo Feng 


Thanks for follow up.
Another alternative is to do
cg_skb_func_proto(enum bpf_func_id func_id)
{
 return sk_filter_func_proto(func_id);
}

I think all socket filter helpers are applicable to cg_skb too.


Yeah, both will effectively be the same at that point:

static const struct bpf_func_proto *
sk_filter_func_proto(enum bpf_func_id func_id)
{
switch (func_id) {
case BPF_FUNC_skb_load_bytes:
return &bpf_skb_load_bytes_proto;
case BPF_FUNC_get_socket_cookie:
return &bpf_get_socket_cookie_proto;
case BPF_FUNC_get_socket_uid:
return &bpf_get_socket_uid_proto;
default:
return bpf_base_func_proto(func_id);
}
}

And with the two additions:

static const struct bpf_func_proto *
cg_skb_func_proto(enum bpf_func_id func_id)
{
switch (func_id) {
case BPF_FUNC_skb_load_bytes:
return &bpf_skb_load_bytes_proto;
+   case BPF_FUNC_get_socket_cookie:
+   return &bpf_get_socket_cookie_proto;
+   case BPF_FUNC_get_socket_uid:
+   return &bpf_get_socket_uid_proto;
default:
return bpf_base_func_proto(func_id);
}
}

Re: [PATCH net-next] bonding: do not pass link-local packets to master-interface

2017-04-14 Thread Jay Vosburgh

Mahesh Bandewar (महेश बंडेवार) wrote:

>On Fri, Apr 14, 2017 at 12:30 PM, Jay Vosburgh
> wrote:
>>
>>
>> Chonggang Li  wrote:
>>
>> >Previously link-local packets excluding LACP (which are handled by
>> >the recv_probe) received on bond slave interfaces are delivered to
>> >stack with bond-master device with RX_HANDLER_ANOTHER, however all
>> >link-local packets are link specific and should be delivered with
>> >exact same link/dev.
>>
>> In what case is the current behavior a problem (my guess would
>> be something related to LLDP)?  What if, e.g., the bond is a bridge
>> port, will STP frames no longer propagate to the bridge?
>>
>When a link-local frame made appear to arrive on bond-master, it
>looses value since 'the info' about link on which it arrived is lost.

This information should really be in the commit message, along
with something about the LLDP issue being solved.

>So idea behind this is to pass the frame to the stack with the link on
>which it arrived without involving the bonding-master. The same will
>happen for the STP frames too. Do you see problem with this approach?

My look through the STP code suggests not, but I'm far from an
expert there.  I'm just concerned that this change will cause some
obscure regression in something that depends on the current behavior.

>> Also, I think the description would be better if it mentioned
>> specifically that the patch is changing how skb->dev is set for link
>> local frames (bond->dev vs receiving interface), e.g.,
>>
>> "[...] however all link-local packets are link specific and
>> should be delivered with skb->dev set to the original device."
>>
>yes, makes sense.
>
>> >Signed-off-by: Chonggang Li 
>> >Signed-off-by: Mahesh Bandewar 
>> >Signed-off-by: Maciej Żenczykowski 
>> >---
>> > drivers/net/bonding/bond_main.c | 3 +++
>> > 1 file changed, 3 insertions(+)
>> >
>> >diff --git a/drivers/net/bonding/bond_main.c 
>> >b/drivers/net/bonding/bond_main.c
>> >index 01e4a69af421..aeca3d8541b9 100644
>> >--- a/drivers/net/bonding/bond_main.c
>> >+++ b/drivers/net/bonding/bond_main.c
>> >@@ -1176,6 +1176,9 @@ static rx_handler_result_t bond_handle_frame(struct 
>> >sk_buff **pskb)
>> >   }
>> >   }
>> >
>> >+  /* link-local packets should not be passed to master interface */
>> >+  if (is_link_local_ether_addr(eth_hdr(skb)->h_dest))
>> >+  return RX_HANDLER_PASS;
>>
>> Since this returns _PASS and not _EXACT, the packet will go
>> through the ptype_base packet handlers, so is the comment strictly
>> correct?
>>
>The stack does not have all link-local specific packet-type handlers
>registered by default so returning _EXACT would find nothing and
>packet will be dropped, right? So returning _PASS will deliver packet
>to the stack with skb->dev as the link on which packet arrived and
>stack can take whatever (default) action it has to take.

Fair enough; I do think the comment would be better phrased as
something like "don't change skb->dev of link local frames"; on first
reading, I thought it meant the frames would be dropped.

-J

>> >   if (bond_should_deliver_exact_match(skb, slave, bond))
>> >   return RX_HANDLER_EXACT;
>> >
>> >--
>> >2.12.2.762.g0e3151a226-goog
>> >
>>
---
-Jay Vosburgh, jay.vosbu...@canonical.com

Re: [PATCH net-next] Add uid and cookie bpf helper to cg_skb_func_proto

2017-04-14 Thread Alexei Starovoitov

On Fri, Apr 14, 2017 at 04:12:14PM -0700, Chenbo Feng wrote:
> From: Chenbo Feng 
> 
> BPF helper functions get_socket_cookie and get_socket_uid can be
> used for network traffic classifications, among others. Expose
> them also to programs of type BPF_PROG_TYPE_CGROUP_SKB. As of
> commit 8f917bba0042 ("bpf: pass sk to helper functions") the required
> skb->sk function is available at both cgroup bpf ingress and egress
> hooks.
> 
> Signed-off-by: Chenbo Feng 

Thanks for follow up.
Another alternative is to do
cg_skb_func_proto(enum bpf_func_id func_id)
{
return sk_filter_func_proto(func_id);
}

I think all socket filter helpers are applicable to cg_skb too.

[PATCH net-next] drivers: net: xgene-v2: Extend ethtool statistics

2017-04-14 Thread Iyappan Subramanian

This patch adds extended statistics reporting to ethtool.

In summary, this patch,

   - adds ethtool.h with the statistics register definitions
   - adds 'struct xge_gstrings_extd_stats' to gather extended stats
   - modifies xge_get_strings(), get_sset_count() and
 get_ethtool_stats() accordingly
   - moves 'struct xge_gstrings_stats' to ethtool.h

Signed-off-by: Iyappan Subramanian 
---
 drivers/net/ethernet/apm/xgene-v2/ethtool.c | 82 ++---
 drivers/net/ethernet/apm/xgene-v2/ethtool.h | 82 +
 drivers/net/ethernet/apm/xgene-v2/mac.h |  3 --
 drivers/net/ethernet/apm/xgene-v2/main.h|  2 +-
 4 files changed, 159 insertions(+), 10 deletions(-)
 create mode 100644 drivers/net/ethernet/apm/xgene-v2/ethtool.h

diff --git a/drivers/net/ethernet/apm/xgene-v2/ethtool.c 
b/drivers/net/ethernet/apm/xgene-v2/ethtool.c
index 0c426f5..421b1dd 100644
--- a/drivers/net/ethernet/apm/xgene-v2/ethtool.c
+++ b/drivers/net/ethernet/apm/xgene-v2/ethtool.c
@@ -21,12 +21,13 @@
 
 #include "main.h"
 
-struct xge_gstrings_stats {
-   char name[ETH_GSTRING_LEN];
-   int offset;
-};
-
 #define XGE_STAT(m){ #m, offsetof(struct xge_pdata, stats.m) }
+#define XGE_EXTD_STAT(m, n)\
+   {   \
+   #m, \
+   n,  \
+   0   \
+   }
 
 static const struct xge_gstrings_stats gstrings_stats[] = {
XGE_STAT(rx_packets),
@@ -36,7 +37,66 @@ struct xge_gstrings_stats {
XGE_STAT(rx_errors)
 };
 
+static struct xge_gstrings_extd_stats gstrings_extd_stats[] = {
+   XGE_EXTD_STAT(tx_rx_64b_frame_cntr, TR64),
+   XGE_EXTD_STAT(tx_rx_127b_frame_cntr, TR127),
+   XGE_EXTD_STAT(tx_rx_255b_frame_cntr, TR255),
+   XGE_EXTD_STAT(tx_rx_511b_frame_cntr, TR511),
+   XGE_EXTD_STAT(tx_rx_1023b_frame_cntr, TR1K),
+   XGE_EXTD_STAT(tx_rx_1518b_frame_cntr, TRMAX),
+   XGE_EXTD_STAT(tx_rx_1522b_frame_cntr, TRMGV),
+   XGE_EXTD_STAT(rx_byte_cntr, RBYT),
+   XGE_EXTD_STAT(rx_pkt_cntr, RPKT),
+   XGE_EXTD_STAT(rx_fcs_error_cntr, RFCS),
+   XGE_EXTD_STAT(rx_multicast_pkt_cntr, RMCA),
+   XGE_EXTD_STAT(rx_broadcast_pkt_cntr, RBCA),
+   XGE_EXTD_STAT(rx_ctrl_frame_pkt_cntr, RXCF),
+   XGE_EXTD_STAT(rx_pause_frame_pkt_cntr, RXPF),
+   XGE_EXTD_STAT(rx_unk_opcode_cntr, RXUO),
+   XGE_EXTD_STAT(rx_align_err_cntr, RALN),
+   XGE_EXTD_STAT(rx_frame_len_err_cntr, RFLR),
+   XGE_EXTD_STAT(rx_code_err_cntr, RCDE),
+   XGE_EXTD_STAT(rx_carrier_sense_err_cntr, RCSE),
+   XGE_EXTD_STAT(rx_undersize_pkt_cntr, RUND),
+   XGE_EXTD_STAT(rx_oversize_pkt_cntr, ROVR),
+   XGE_EXTD_STAT(rx_fragments_cntr, RFRG),
+   XGE_EXTD_STAT(rx_jabber_cntr, RJBR),
+   XGE_EXTD_STAT(rx_dropped_pkt_cntr, RDRP),
+   XGE_EXTD_STAT(tx_byte_cntr, TBYT),
+   XGE_EXTD_STAT(tx_pkt_cntr, TPKT),
+   XGE_EXTD_STAT(tx_multicast_pkt_cntr, TMCA),
+   XGE_EXTD_STAT(tx_broadcast_pkt_cntr, TBCA),
+   XGE_EXTD_STAT(tx_pause_ctrl_frame_cntr, TXPF),
+   XGE_EXTD_STAT(tx_defer_pkt_cntr, TDFR),
+   XGE_EXTD_STAT(tx_excv_defer_pkt_cntr, TEDF),
+   XGE_EXTD_STAT(tx_single_col_pkt_cntr, TSCL),
+   XGE_EXTD_STAT(tx_multi_col_pkt_cntr, TMCL),
+   XGE_EXTD_STAT(tx_late_col_pkt_cntr, TLCL),
+   XGE_EXTD_STAT(tx_excv_col_pkt_cntr, TXCL),
+   XGE_EXTD_STAT(tx_total_col_cntr, TNCL),
+   XGE_EXTD_STAT(tx_pause_frames_hnrd_cntr, TPFH),
+   XGE_EXTD_STAT(tx_drop_frame_cntr, TDRP),
+   XGE_EXTD_STAT(tx_jabber_frame_cntr, TJBR),
+   XGE_EXTD_STAT(tx_fcs_error_cntr, TFCS),
+   XGE_EXTD_STAT(tx_ctrl_frame_cntr, TXCF),
+   XGE_EXTD_STAT(tx_oversize_frame_cntr, TOVR),
+   XGE_EXTD_STAT(tx_undersize_frame_cntr, TUND),
+   XGE_EXTD_STAT(tx_fragments_cntr, TFRG)
+};
+
 #define XGE_STATS_LEN  ARRAY_SIZE(gstrings_stats)
+#define XGE_EXTD_STATS_LEN ARRAY_SIZE(gstrings_extd_stats)
+
+static void xge_mac_get_extd_stats(struct xge_pdata *pdata)
+{
+   u32 data;
+   int i;
+
+   for (i = 0; i < XGE_EXTD_STATS_LEN; i++) {
+   data = xge_rd_csr(pdata, gstrings_extd_stats[i].addr);
+   gstrings_extd_stats[i].value += data;
+   }
+}
 
 static void xge_get_drvinfo(struct net_device *ndev,
struct ethtool_drvinfo *info)
@@ -62,6 +122,11 @@ static void xge_get_strings(struct net_device *ndev, u32 
stringset, u8 *data)
memcpy(p, gstrings_stats[i].name, ETH_GSTRING_LEN);
p += ETH_GSTRING_LEN;
}
+
+   for (i = 0; i < XGE_EXTD_STATS_LEN; i++) {
+   memcpy(p, gstrings_extd_stats[i].name, ETH_GSTRING_LEN);
+   p += ETH_GSTRING_LEN;
+   }
 }
 
 static int xge_get_sset_count(struct net_device

[PATCH net-next] Add uid and cookie bpf helper to cg_skb_func_proto

2017-04-14 Thread Chenbo Feng

From: Chenbo Feng 

BPF helper functions get_socket_cookie and get_socket_uid can be
used for network traffic classifications, among others. Expose
them also to programs of type BPF_PROG_TYPE_CGROUP_SKB. As of
commit 8f917bba0042 ("bpf: pass sk to helper functions") the required
skb->sk function is available at both cgroup bpf ingress and egress
hooks.

Signed-off-by: Chenbo Feng 
---
 net/core/filter.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/net/core/filter.c b/net/core/filter.c
index ce2a19d..b6db9e330 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -2769,6 +2769,10 @@ cg_skb_func_proto(enum bpf_func_id func_id)
switch (func_id) {
case BPF_FUNC_skb_load_bytes:
return &bpf_skb_load_bytes_proto;
+   case BPF_FUNC_get_socket_cookie:
+   return &bpf_get_socket_cookie_proto;
+   case BPF_FUNC_get_socket_uid:
+   return &bpf_get_socket_uid_proto;
default:
return bpf_base_func_proto(func_id);
}
-- 
2.7.4

[PATCH iproute2 v2] ip vrf: Add command name next to pid

2017-04-14 Thread David Ahern

'ip vrf pids' is used to list processes bound to a vrf, but it only
shows the pid leaving a lot of work for the user. Add the command
name to the output. With this patch you get the more user friendly:

$ ip vrf pids mgmt
 1121  ntpd
 1418  gdm-session-wor
 1488  gnome-session
 1491  dbus-launch
 1492  dbus-daemon
 1565  sshd
 ...

Signed-off-by: David Ahern 
---
v2
- changed get_comm to get_command_name with size_t for len type as
  requested by Stephen

 include/utils.h |  1 +
 ip/ipvrf.c  | 24 ++--
 lib/fs.c| 42 ++
 3 files changed, 57 insertions(+), 10 deletions(-)

diff --git a/include/utils.h b/include/utils.h
index 22369e0b4e03..8c12e1e2a60c 100644
--- a/include/utils.h
+++ b/include/utils.h
@@ -260,5 +260,6 @@ int get_real_family(int rtm_type, int rtm_family);
 int cmd_exec(const char *cmd, char **argv, bool do_fork);
 int make_path(const char *path, mode_t mode);
 char *find_cgroup2_mount(void);
+int get_command_name(const char *pid, char *comm, size_t len);
 
 #endif /* __UTILS_H__ */
diff --git a/ip/ipvrf.c b/ip/ipvrf.c
index 5e204a9ebbb1..0f611b44b78a 100644
--- a/ip/ipvrf.c
+++ b/ip/ipvrf.c
@@ -111,27 +111,31 @@ static void read_cgroup_pids(const char *base_path, char 
*name)
 {
char path[PATH_MAX];
char buf[4096];
-   ssize_t n;
-   int fd;
+   FILE *fp;
 
if (snprintf(path, sizeof(path), "%s/vrf/%s%s",
 base_path, name, CGRP_PROC_FILE) >= sizeof(path))
return;
 
-   fd = open(path, O_RDONLY);
-   if (fd < 0)
+   fp = fopen(path, "r");
+   if (!fp)
return; /* no cgroup file, nothing to show */
 
/* dump contents (pids) of cgroup.procs */
-   while (1) {
-   n = read(fd, buf, sizeof(buf) - 1);
-   if (n <= 0)
-   break;
+   while (fgets(buf, sizeof(buf), fp)) {
+   char *nl, comm[32];
 
-   printf("%s", buf);
+   nl = strchr(buf, '\n');
+   if (nl)
+   *nl = '\0';
+
+   if (get_command_name(buf, comm, sizeof(comm)))
+   strcpy(comm, "");
+
+   printf("%5s  %s\n", buf, comm);
}
 
-   close(fd);
+   fclose(fp);
 }
 
 /* recurse path looking for PATH[/NETNS]/vrf/NAME */
diff --git a/lib/fs.c b/lib/fs.c
index 12a4657a0bc9..c59ac564581d 100644
--- a/lib/fs.c
+++ b/lib/fs.c
@@ -14,6 +14,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -149,3 +150,44 @@ int make_path(const char *path, mode_t mode)
 
return rc;
 }
+
+int get_command_name(const char *pid, char *comm, size_t len)
+{
+   char path[PATH_MAX];
+   char line[128];
+   FILE *fp;
+
+   if (snprintf(path, sizeof(path),
+"/proc/%s/status", pid) >= sizeof(path)) {
+   return -1;
+   }
+
+   fp = fopen(path, "r");
+   if (!fp)
+   return -1;
+
+   comm[0] = '\0';
+   while (fgets(line, sizeof(line), fp)) {
+   char *nl, *name;
+
+   name = strstr(line, "Name:");
+   if (!name)
+   continue;
+
+   name += 5;
+   while (isspace(*name))
+   name++;
+
+   nl = strchr(name, '\n');
+   if (nl)
+   *nl = '\0';
+
+   strncpy(comm, name, len - 1);
+   comm[len - 1] = '\0';
+   break;
+   }
+
+   fclose(fp);
+
+   return 0;
+}
-- 
2.1.4

Re: [PATCH iproute2] ip vrf: Add command name next to pid

2017-04-14 Thread David Ahern

On 4/14/17 4:57 PM, Stephen Hemminger wrote:
> Overall, this looks like a good idea.
> 
>> +
>> +int get_comm(char *pid, char *comm, int len)
> 
> 
> Please use more glibc style function signature, like:
> 
> int get_comm(const char *pid, char *comm, size_t len)
> 
> And maybe even
> int get_command_name(pid_t pid, char *comm, size_t len)
> 

I'd pref to keep as char *. No sense converting from string to pid_t to
string. The vrf code is reading a cgroups file; the other potential user
is netns which is also reading pid from a file.

Re: [PATCH net-next 1/1 v3] drivers: net: rmnet: Initial implementation

2017-04-14 Thread Subash Abhinov Kasiviswanathan


+https://source.codeaurora.org/quic/la/platform/vendor/qcom-opensource/\
+dataservices/tree/rmnetctl


Don't split URL better to have long line.

diff --git a/drivers/net/rmnet/Kconfig b/drivers/net/rmnet/Kconfig


Since this is Qualcomm and Ethernet specific, maybe better to put
in drivers/net/ethernet/qualcom/rmnet


+config RMNET_DEBUG


Please use network device standard debug mechanism.
netif_msg_XXX


+   buffloc += snprintf(&buffer[buffloc], sizeof(buffer) - buffloc,
+   " %02x", skb->data[i]);


If you really have to do this. Use hex_dump_bytes API.


+   skb->mac_header = 0;
+   skb->mac_len = 0;
+}


Why not use sbk_set_mac_header(skb, 0)?


+static inline struct rmnet_phys_ep_conf_s *_rmnet_get_phys_ep_config
+   (struct net_device *dev)

awkward line break.
dev could be const


+   config = (struct rmnet_phys_ep_conf_s *)
+ rcu_dereference(dev->rx_handler_data);


Please don't directly reference rx_handler. There is already functions
like netdev_is_rx_handler_busy() to abstract that API.


+ * @epconfig:endpoint configuration structure to set
+ */


You are using docbook format here, but this is not a docbook comment.
ie.
/**
 * function - This is a docbook comment
 * @dev: this is a param
 */

Plus these are static functions so there is no point in documentating
internal API with docbook.


+   ASSERT_RTNL();
+
+   if (!dev || config_id < RMNET_LOCAL_LOGICAL_ENDPOINT ||
+   config_id >= RMNET_MAX_LOGICAL_EP)
+   return -EINVAL;


For internal API's you should validate parmeters at the external
entry point not in each call. Otherwise you have a multitude of
impossible error checks and dead code paths.


+   .next = 0,
+   .priority = 0
+};

Don't initialize fields that are not used or should be zero.



Hi Stephen

I'll make these changes.



Could just be:

static inline int _rmnet_is_physical_endpoint_associated(const struct
net_device *dev)
{
rx_handler_func_t *rx_handler
= rcu_dereference(dev->rx_handler);

return rx_handler == rmet_rx_handler;
}

But standard practice is to use ndo_ops to identify self in network 
drivers.

I.e
return dev->netdev_ops == &rmnet_device_ops;



The netdevice which is associated is the physical (real_dev). This 
device
can vary based on hardware and will have its own netdev_ops associated 
with it.

rmnet_device_ops applies to the rmnet devices only. I'll use the first
option you have suggested.


+   if (!data[IFLA_RMNET_MUX_ID])
+   return -EINVAL;


So you are inventing private link netlink attributes.
Why? Why can't you use device switch, bridge, or other master/slave 
model.




I would like to eventually add more configuration options for the
ingress & egress data formats as well as the endpoint configuration
(VND mode vs BRIDGE mode). I was able to re-use IFLA_VLAN_ID for
IFLA_RMNET_MUX_ID and test but it wasn't sufficient for the additional
parameters. I'll check the bridge attributes and try to reuse it.

--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,

a Linux Foundation Collaborative Project

Re: [iproute2 net-next v2 0/3] ip netconf improvements

2017-04-14 Thread Stephen Hemminger

On Thu, 23 Mar 2017 19:51:19 -0700
David Ahern  wrote:

> Currently, ip netconf only shows data for ipv4 and ipv6 for dumps
> and just ipv4 for device requests. Improve the user experience by
> using the new kernel patch to dump all address families that have
> registered. For example, if mpls_router module is loaded then mpls
> values are displayed along with ipv4 and ipv6.
> 
> If the new feature is not supported (new iproute2 on older kernel)
> the kernel returns the nlmsg error EOPNOTSUPP which can be trapped
> and fallback to existing behavior.
> 
> v2
> - fixed index conversion in patch 3 per nicholas' comment
> 
> David Ahern (3):
>   netlink: Add flag to suppress print of nlmsg error
>   ip netconf: Show all address families by default in dumps
>   ip netconf: show all families on dev request
> 
>  include/libnetlink.h |  1 +
>  ip/ipnetconf.c   | 36 +---
>  lib/libnetlink.c |  3 ++-
>  3 files changed, 28 insertions(+), 12 deletions(-)
> 

Sure applied, just wanted to make sure there was nothing interrelated
with recent spate of changes to netlink notifications.

Re: [patch iproute2/net-next repost] devlink: Add support for pipeline debug (dpipe)

2017-04-14 Thread Stephen Hemminger

On Thu, 13 Apr 2017 11:30:27 +0200
Jiri Pirko  wrote:

> We actually took this code from teamdctl (at least it was an influence).
> Devlink style is so much different in every aspect from the rest of the
> iproute2 suite. And I did it on purpose, because it is much nicer and
> easier to read. I would like to continue on this and don't do things in
> the was the existing tools do. I don't see any problem with that.

No. That is road to ruin. Every package is free to use what ever style
it wants. But don't crossover please.

Re: [PATCH iproute2] ip vrf: Add command name next to pid

2017-04-14 Thread Stephen Hemminger

Overall, this looks like a good idea.

> +
> +int get_comm(char *pid, char *comm, int len)


Please use more glibc style function signature, like:

int get_comm(const char *pid, char *comm, size_t len)

And maybe even
int get_command_name(pid_t pid, char *comm, size_t len)

[PATCH] net: natsemi: ns83820: add checks for dma mapping error

2017-04-14 Thread Alexey Khoroshilov

The driver does not check if mapping dma memory succeed.
The patch adds the checks and failure handling.

Found by Linux Driver Verification project (linuxtesting.org).

Signed-off-by: Alexey Khoroshilov 
---
 drivers/net/ethernet/natsemi/ns83820.c | 13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/natsemi/ns83820.c 
b/drivers/net/ethernet/natsemi/ns83820.c
index 729095db3e08..7d6d692ebb92 100644
--- a/drivers/net/ethernet/natsemi/ns83820.c
+++ b/drivers/net/ethernet/natsemi/ns83820.c
@@ -534,14 +534,19 @@ static inline int ns83820_add_rx_skb(struct ns83820 *dev, 
struct sk_buff *skb)
);
 #endif
 
+   buf = pci_map_single(dev->pci_dev, skb->data,
+REAL_RX_BUF_SIZE, PCI_DMA_FROMDEVICE);
+   if (pci_dma_mapping_error(dev->pci_dev, buf)) {
+   kfree_skb(skb);
+   return 1;
+   }
+
sg = dev->rx_info.descs + (next_empty * DESC_SIZE);
BUG_ON(NULL != dev->rx_info.skbs[next_empty]);
dev->rx_info.skbs[next_empty] = skb;
 
dev->rx_info.next_empty = (next_empty + 1) % NR_RX_DESC;
cmdsts = REAL_RX_BUF_SIZE | CMDSTS_INTR;
-   buf = pci_map_single(dev->pci_dev, skb->data,
-REAL_RX_BUF_SIZE, PCI_DMA_FROMDEVICE);
build_rx_desc(dev, sg, 0, buf, cmdsts, 0);
/* update link of previous rx */
if (likely(next_empty != dev->rx_info.next_rx))
@@ -1136,6 +1141,10 @@ static netdev_tx_t ns83820_hard_start_xmit(struct 
sk_buff *skb,
if (nr_frags)
len -= skb->data_len;
buf = pci_map_single(dev->pci_dev, skb->data, len, PCI_DMA_TODEVICE);
+   if (pci_dma_mapping_error(dev->pci_dev, buf)) {
+   dev_kfree_skb_any(skb);
+   return NETDEV_TX_OK;
+   }
 
first_desc = dev->tx_descs + (free_idx * DESC_SIZE);
 
-- 
2.7.4

Re: [PATCH next 2/5] bonding: initialize work-queues during creation of bond

2017-04-14 Thread Joe Stringer

On 8 March 2017 at 10:55, Mahesh Bandewar  wrote:
> From: Mahesh Bandewar 
>
> Initializing work-queues every time ifup operation performed is unnecessary
> and can be performed only once when the port is created.
>
> Signed-off-by: Mahesh Bandewar 
> ---
>  drivers/net/bonding/bond_main.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> index 619f0c65f18a..1329110ed85f 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -3270,8 +3270,6 @@ static int bond_open(struct net_device *bond_dev)
> }
> }
>
> -   bond_work_init_all(bond);
> -
> if (bond_is_lb(bond)) {
> /* bond_alb_initialize must be called before the timer
>  * is started.
> @@ -4691,6 +4689,8 @@ int bond_create(struct net *net, const char *name)
>
> netif_carrier_off(bond_dev);
>
> +   bond_work_init_all(bond);
> +
> rtnl_unlock();
> if (res < 0)
> bond_destructor(bond_dev);
> --

Hi Mahesh,

I've noticed that this patch breaks bonding within namespaces if
you're not careful to perform device cleanup correctly.

Here's my repro script, you can run on any net-next with this patch
and you'll start seeing some weird behaviour:

ip netns add foo
ip li add veth0 type veth peer name veth0+ netns foo
ip li add veth1 type veth peer name veth1+ netns foo
ip netns exec foo ip li add bond0 type bond
ip netns exec foo ip li set dev veth0+ master bond0
ip netns exec foo ip li set dev veth1+ master bond0
ip netns exec foo ip addr add dev bond0 192.168.0.1/24
ip netns exec foo ip li set dev bond0 up
ip li del dev veth0
ip li del dev veth1

The second to last command segfaults, last command hangs. rtnl is now
permanently locked. It's not a problem if you take bond0 down before
deleting veths, or delete bond0 before deleting veths. If you delete
either end of the veth pair as per above, either inside or outside the
namespace, it hits this problem.

Here's some kernel logs:
[ 1221.801610] bond0: Enslaving veth0+ as an active interface with an up link
[ 1224.449581] bond0: Enslaving veth1+ as an active interface with an up link
[ 1281.193863] bond0: Releasing backup interface veth0+
[ 1281.193866] bond0: the permanent HWaddr of veth0+ -
16:bf:fb:e0:b8:43 - is still in use by bond0 - set the HWaddr of
veth0+ to a different address to avoid conflicts
[ 1281.193867] [ cut here ]
[ 1281.193873] WARNING: CPU: 0 PID: 2024 at kernel/workqueue.c:1511
__queue_delayed_work+0x13f/0x150
[ 1281.193873] Modules linked in: bonding veth openvswitch nf_nat_ipv6
nf_nat_ipv4 nf_nat autofs4 nfsd auth_rpcgss nfs_acl binfmt_misc nfs
lockd grace sunrpc fscache ppdev vmw_balloon coretemp psmouse
serio_raw vmwgfx ttm drm_kms_helper vmw_vmci netconsole parport_pc
configfs drm i2c_piix4 fb_sys_fops syscopyarea sysfillrect sysimgblt
shpchp mac_hid nf_conntrack_ipv6 nf_defrag_ipv6 nf_conntrack_ipv4
nf_defrag_ipv4 nf_conntrack libcrc32c lp parport hid_generic usbhid
hid mptspi mptscsih e1000 mptbase ahci libahci
[ 1281.193905] CPU: 0 PID: 2024 Comm: ip Tainted: GW
4.10.0-bisect-bond-v0.14 #37
[ 1281.193906] Hardware name: VMware, Inc. VMware Virtual
Platform/440BX Desktop Reference Platform, BIOS 6.00 09/30/2014
[ 1281.193906] Call Trace:
[ 1281.193912]  dump_stack+0x63/0x89
[ 1281.193915]  __warn+0xd1/0xf0
[ 1281.193917]  warn_slowpath_null+0x1d/0x20
[ 1281.193918]  __queue_delayed_work+0x13f/0x150
[ 1281.193920]  queue_delayed_work_on+0x27/0x40
[ 1281.193929]  bond_change_active_slave+0x25b/0x670 [bonding]
[ 1281.193932]  ? synchronize_rcu_expedited+0x27/0x30
[ 1281.193935]  __bond_release_one+0x489/0x510 [bonding]
[ 1281.193939]  ? addrconf_notify+0x1b7/0xab0
[ 1281.193942]  bond_netdev_event+0x2c5/0x2e0 [bonding]
[ 1281.193944]  ? netconsole_netdev_event+0x124/0x190 [netconsole]
[ 1281.193947]  notifier_call_chain+0x49/0x70
[ 1281.193948]  raw_notifier_call_chain+0x16/0x20
[ 1281.193950]  call_netdevice_notifiers_info+0x35/0x60
[ 1281.193951]  rollback_registered_many+0x23b/0x3e0
[ 1281.193953]  unregister_netdevice_many+0x24/0xd0
[ 1281.193955]  rtnl_delete_link+0x3c/0x50
[ 1281.193956]  rtnl_dellink+0x8d/0x1b0
[ 1281.193960]  rtnetlink_rcv_msg+0x95/0x220
[ 1281.193962]  ? __kmalloc_node_track_caller+0x35/0x280
[ 1281.193964]  ? __netlink_lookup+0xf1/0x110
[ 1281.193966]  ? rtnl_newlink+0x830/0x830
[ 1281.193967]  netlink_rcv_skb+0xa7/0xc0
[ 1281.193969]  rtnetlink_rcv+0x28/0x30
[ 1281.193970]  netlink_unicast+0x15b/0x210
[ 1281.193971]  netlink_sendmsg+0x319/0x390
[ 1281.193974]  sock_sendmsg+0x38/0x50
[ 1281.193975]  ___sys_sendmsg+0x25c/0x270
[ 1281.193978]  ? mem_cgroup_commit_charge+0x76/0xf0
[ 1281.193981]  ? page_add_new_anon_rmap+0x89/0xc0
[ 1281.193984]  ? lru_cache_add_active_or_unevictable+0x35/0xb0
[ 1281.193985]  ? __handle_mm_fault+0x4e9/0x1170
[ 1281.193987]  __sys_sendmsg+0x45/0x80
[ 1281.193989]  S

Re: [PATCH iproute2 net-next 2/3] iproute: add support for SR-IPv6 lwtunnel encapsulation

2017-04-14 Thread Stephen Hemminger

On Fri, 14 Apr 2017 14:36:22 +0200
David Lebrun  wrote:

> + if (!tb[SEG6_IPTUNNEL_SRH])
> + return

Bad indentation?

Re: [PATCH iproute2 net-next 1/3] ip: add ip sr command to control SR-IPv6 internal structures

2017-04-14 Thread Stephen Hemminger

On Fri, 14 Apr 2017 14:36:21 +0200
David Lebrun  wrote:

> +static struct {
> + int cmd;
Why not unsigned? you only assign positive values
> + struct in6_addr addr;
> + __u32 keyid;
> + char *pass;
Why not const char *? or do you free the value on exit?
> + __u8 alg_id;
> +} opts;
> +

Re: [PATCH v3 net-next RFC] Generic XDP

2017-04-14 Thread Jakub Kicinski

On Fri, 14 Apr 2017 12:28:15 -0700, Alexei Starovoitov wrote:
> On Fri, Apr 14, 2017 at 11:05:25AM +0200, Jesper Dangaard Brouer wrote:
> > > 
> > > We are consistently finding that there is this real need to
> > > communicate XDP capabilities, or somehow verify that the needs
> > > of an XDP program can be satisfied by a given implementation.  
> > 
> > I fully agree that we need some way to express capabilities[1]
> >   
> > > Maximum headroom is just one.  
> 
> I don't like the idea of asking program author to explain capabilities
> to the kernel. Right now the verifier already understands more about
> the program than human does. If the verifier cannot deduct from the
> insns what program will be doing, it's pretty much guarantee
> that program author has no idea either.
> If we add 'required_headroom' as an extra flag to BPF_PROG_LOAD,
> the users will just pass something like 64 or 128 whereas the program
> might only be doing IPIP encap and that will cause kernel to
> provide extra headroom for no good reason or reject the program
> whereas it could have run just fine.
> 
> So I very much agree with this part:
> > ... or somehow verify that the needs
> > of an XDP program can be satisfied by a given implementation.  
> 
> we already have cb_access, dst_needed, xdp_adjust_head flags
> that verifier discovers in the program.
> For headroom we need one more. The verifier can see
> the constant passed into bpf_xdp_adjust_head().
> It's trickier if it's a variable, but the verifier can estimate min/max
> of the variable already and worst case it can say that it
> will be XDP_PACKET_HEADROOM.
> If the program is doing variable length bpf_xdp_adjust_head(),
> the human has no idea how much they need and cannot tell kernel anyway.

If I understand correctly that the current proposal is to:

1. Assume program needs 256 (XDP_PACKET_HEADROOM) of prepend.
2. Make verifier try to lower that through byte code analysis.
3. Make the load fail if (prog->max_prepend > driver->xdp_max_prepend).

Would that not cause unnecessary load failures?  I'm worried it would
force users to do things like this:

struct map_value *val;

if (val->prep_len > 64) /* Prove to verifier this is short */
return XDP_DROP;
if (xdp_adjust_head(xdp, -val->prep_len))
...

Or worse not do that, use Qlogic, Broadcom, Mellanox or Netronome cards
and then be in for a nasty surprise when they try to load the same
program for Intel parts :(

I agree that trying to express capabilities to user space is quickly
going to become more confusing than helpful.  However, I think here we
need the opposite - we need an opt-in way of program requesting
guarantees about the datapath.

We may expect most users won't care and we may want to still depend on
the verifier to analyze the program and enable optimizations, but
depending on the verifier for proving prepend lengths is scary.  Or are
we just discussion the optimizations here and not the guarantees about
headroom availability?

ipv6 udp early demux breaks udp_l3mdev_accept=0

2017-04-14 Thread David Ahern

Subash:

My understanding of early demux is that it should only match connected
sockets (src/sport + dst/dport are all set). The ipv6 udp early demux
code added in 5425077d73e0c is doing a generic socket lookup which does
not require an exact match and accordingly breaks udp_l3mdev_accept=0.
Please fix.

David

Re: [PATCH v3 net-next RFC] Generic XDP

2017-04-14 Thread Daniel Borkmann


On 04/14/2017 09:28 PM, Alexei Starovoitov wrote:

On Fri, Apr 14, 2017 at 11:05:25AM +0200, Jesper Dangaard Brouer wrote:


We are consistently finding that there is this real need to
communicate XDP capabilities, or somehow verify that the needs
of an XDP program can be satisfied by a given implementation.


I fully agree that we need some way to express capabilities[1]


Maximum headroom is just one.


I don't like the idea of asking program author to explain capabilities
to the kernel. Right now the verifier already understands more about
the program than human does. If the verifier cannot deduct from the
insns what program will be doing, it's pretty much guarantee
that program author has no idea either.
If we add 'required_headroom' as an extra flag to BPF_PROG_LOAD,
the users will just pass something like 64 or 128 whereas the program
might only be doing IPIP encap and that will cause kernel to
provide extra headroom for no good reason or reject the program
whereas it could have run just fine.


Fully agree, such an extension is likely to be used wrongly or with
some default size as we have right now with XDP_PACKET_HEADROOM to
cover most use cases in order to not bother the user to deal with
this resp. not to complicate things more.


So I very much agree with this part:

... or somehow verify that the needs
of an XDP program can be satisfied by a given implementation.


we already have cb_access, dst_needed, xdp_adjust_head flags
that verifier discovers in the program.
For headroom we need one more. The verifier can see
the constant passed into bpf_xdp_adjust_head().
It's trickier if it's a variable, but the verifier can estimate min/max
of the variable already and worst case it can say that it
will be XDP_PACKET_HEADROOM.


+1, we should try hard to reuse such information from the verifier
to determine specific requirements the program has, and check against
them at prog attach time. This works okay so far for the already
mentioned bits in struct bpf_prog, and could be further extended.


If the program is doing variable length bpf_xdp_adjust_head(),
the human has no idea how much they need and cannot tell kernel anyway.

Re: [PATCH net-next 1/1 v3] drivers: net: rmnet: Initial implementation

2017-04-14 Thread Stephen Hemminger

On Fri, 14 Apr 2017 15:57:22 -0600
Subash Abhinov Kasiviswanathan  wrote:

> >> +  rmnet_kfree_skb
> >> +  (skb,
> >> +   RMNET_STATS_SKBFREE_INGRESS_NOT_EXPECT_MAPD);  
> > 
> > very odd formatting. Please fix.
> >   
> Checkpatch complains if it is over 80 chars in a line, so I had to do 
> this.
> I'll change to a single line.

Either ignore checkpatch, it is only a dumb script; or better yet refactor
the code so it isn't indented so much.

Re: [PATCH net-next 1/1 v3] drivers: net: rmnet: Initial implementation

2017-04-14 Thread Subash Abhinov Kasiviswanathan


No, please use standard facilities in order to do debug logging.


+static struct rmnet_logical_ep_conf_s *_rmnet_get_logical_ep


Dont use "_" at the start of a name without purpose


+static int rmnet_associate_network_device(struct net_device *dev)


I would expect to see here you making connection between real_dev and
rmnet dev. I don't see such thing. Name of the function is misleading.


+
+   config = kmalloc(sizeof(*config), GFP_ATOMIC);


kzalloc, and you don't have to zero the memory.


+static struct notifier_block rmnet_dev_notifier = {


You should add "__read_mostly"


+   .notifier_call = rmnet_config_notify_cb,
+   .next = 0,
+   .priority = 0


This initialization of 0s is not needed.

+/* rmnet_vnd_is_vnd() gives mux_id + 1, so subtract 1 to get the 
correct mux_id

+ */


Fix this comment format.
+void rmnet_print_packet(const struct sk_buff *skb, const char *dev, 
char dir)


No reason to have this function. One can use P_ALL tap to get skbs to
userspace.


+/* rmnet_bridge_handler() - Bridge related functionality
+ */


Fix the comment format (you have it on multiple places)


+static rx_handler_result_t rmnet_bridge_handler
+   (struct sk_buff *skb, struct rmnet_logical_ep_conf_s *ep)


The formatting is incorrect:

static rx_handler_result_t
rmnet_bridge_handler(struct sk_buff *skb, struct 
rmnet_logical_ep_conf_s *ep)




+   config = (struct rmnet_phys_ep_conf_s *)
+   rcu_dereference(skb->dev->rx_handler_data);
+
+   if (!config) {


Cannot happen. Please remove this.


+   .ndo_set_mac_address = 0,
+   .ndo_validate_addr = 0,


These are NULL by default. No need to init.


+   if ((id < 0) || (id >= RMNET_MAX_VND) || !rmnet_devices[id])


Unneeded inner "()"s. I see you have it on multiple places.


+
+   dev_conf = (struct rmnet_vnd_private_s *)netdev_priv(dev);


The typecast is not needed since netdev_priv is void*. You have it all
over the code.


+#define ETH_P_MAP  0xDA1A  /* Multiplexing and Aggregation Protocol
+*  NOT AN OFFICIALLY REGISTERED ID ]


Please push this and ARPHRD_RAWIP as separate patches, to increase the
visibility.


+enum {
+   IFLA_RMNET_UNSPEC,
+   IFLA_RMNET_MUX_ID,
+   __IFLA_RMNET_MAX,
+};


This belongs to include/uapi/linux/if_link.h
Please see IFLA_BOND_* as example

+#define RMNET_INIT_OK 0
+#define RMNET_INIT_ERROR  1


Please use common error codes (0/-ENOMEM/-EINVAL/...)


+   static u32 rmnet_mod_mask = X


Don't use this custom helpers. Use existing loggign facilities.


+   pr_notice("[RMNET:LOW] %s(): " fmt "\n", __func__, \
+   ##__VA_ARGS__); \
+   } while (0)


These look scarry. Please use netdev_err, dev_err and others instead.


+unsigned int rmnet_log_module_mask;
+module_param(rmnet_log_module_mask, uint, 0644);
+MODULE_PARM_DESC(rmnet_log_module_mask, "Logging module mask");


No module options please.


+   config = (struct rmnet_phys_ep_conf_s *)
+   rcu_dereference(skb->dev->rx_handler_data);


This is certainly a misuse of dev->rx_handler_data. Dev private of a
function arg to carry the pointer around.


+struct net_device *rmnet_devices[RMNET_MAX_VND];


Avoid this global variable.



Hi Jiri

I'll make these changes.


+   if (!data[IFLA_RMNET_MUX_ID])
+   return -EINVAL;
+
+   mux_id = nla_get_u16(data[IFLA_VLAN_ID]);


This is a bug I believe... ^^^
I'm pretty sure that you did not test this code.



Since both IFLA_VLAN_ID and IFLA_RMNET_MUX_ID are 1 from enum, this 
actually

works. I was using VLAN as a reference, so I missed to change this to
IFLA_RMNET_MUX_ID.




+   rmnet_kfree_skb
+   (skb,
+RMNET_STATS_SKBFREE_INGRESS_NOT_EXPECT_MAPD);


very odd formatting. Please fix.

Checkpatch complains if it is over 80 chars in a line, so I had to do 
this.

I'll change to a single line.

--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,

a Linux Foundation Collaborative Project

Re: [PATCH net-next] bonding: do not pass link-local packets to master-interface

2017-04-14 Thread महेश बंडेवार

On Fri, Apr 14, 2017 at 12:30 PM, Jay Vosburgh
 wrote:
>
>
> Chonggang Li  wrote:
>
> >Previously link-local packets excluding LACP (which are handled by
> >the recv_probe) received on bond slave interfaces are delivered to
> >stack with bond-master device with RX_HANDLER_ANOTHER, however all
> >link-local packets are link specific and should be delivered with
> >exact same link/dev.
>
> In what case is the current behavior a problem (my guess would
> be something related to LLDP)?  What if, e.g., the bond is a bridge
> port, will STP frames no longer propagate to the bridge?
>
When a link-local frame made appear to arrive on bond-master, it
looses value since 'the info' about link on which it arrived is lost.
So idea behind this is to pass the frame to the stack with the link on
which it arrived without involving the bonding-master. The same will
happen for the STP frames too. Do you see problem with this approach?

> Also, I think the description would be better if it mentioned
> specifically that the patch is changing how skb->dev is set for link
> local frames (bond->dev vs receiving interface), e.g.,
>
> "[...] however all link-local packets are link specific and
> should be delivered with skb->dev set to the original device."
>
yes, makes sense.

> >Signed-off-by: Chonggang Li 
> >Signed-off-by: Mahesh Bandewar 
> >Signed-off-by: Maciej Żenczykowski 
> >---
> > drivers/net/bonding/bond_main.c | 3 +++
> > 1 file changed, 3 insertions(+)
> >
> >diff --git a/drivers/net/bonding/bond_main.c 
> >b/drivers/net/bonding/bond_main.c
> >index 01e4a69af421..aeca3d8541b9 100644
> >--- a/drivers/net/bonding/bond_main.c
> >+++ b/drivers/net/bonding/bond_main.c
> >@@ -1176,6 +1176,9 @@ static rx_handler_result_t bond_handle_frame(struct 
> >sk_buff **pskb)
> >   }
> >   }
> >
> >+  /* link-local packets should not be passed to master interface */
> >+  if (is_link_local_ether_addr(eth_hdr(skb)->h_dest))
> >+  return RX_HANDLER_PASS;
>
> Since this returns _PASS and not _EXACT, the packet will go
> through the ptype_base packet handlers, so is the comment strictly
> correct?
>
The stack does not have all link-local specific packet-type handlers
registered by default so returning _EXACT would find nothing and
packet will be dropped, right? So returning _PASS will deliver packet
to the stack with skb->dev as the link on which packet arrived and
stack can take whatever (default) action it has to take.

> -J
>
> >   if (bond_should_deliver_exact_match(skb, slave, bond))
> >   return RX_HANDLER_EXACT;
> >
> >--
> >2.12.2.762.g0e3151a226-goog
> >
>
> ---
> -Jay Vosburgh, jay.vosbu...@canonical.com

Re: [PATCH net] ipv6: drop non loopback packets claiming to originate from ::1

2017-04-14 Thread Hannes Frederic Sowa

On Fri, Apr 14, 2017, at 20:22, Florian Westphal wrote:
> We lack a saddr check for ::1. This causes security issues e.g. with acls
> permitting connections from ::1 because of assumption that these
> originate
> from local machine.
> 
> Assuming a source address of ::1 is local seems reasonable.
> RFC4291 doesn't allow such a source address either, so drop such packets.
> 
> Reported-by: Eric Dumazet 
> Signed-off-by: Florian Westphal 

Acked-by: Hannes Frederic Sowa 

Thanks!

[PATCH net-next 2/2] hv_netvsc: change netvsc device default duplex to FULL

2017-04-14 Thread Stephen Hemminger

From: Simon Xiao 

The netvsc device supports full duplex by default.
This warnings in log from bonding device which did not like
seeing UNKNOWN duplex.

Signed-off-by: Simon Xiao 
---
 drivers/net/hyperv/netvsc_drv.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index 76489a06ef3b..fad5a57765b9 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -815,7 +815,7 @@ static void netvsc_init_settings(struct net_device *dev)
struct net_device_context *ndc = netdev_priv(dev);
 
ndc->speed = SPEED_UNKNOWN;
-   ndc->duplex = DUPLEX_UNKNOWN;
+   ndc->duplex = DUPLEX_FULL;
 }
 
 static int netvsc_get_link_ksettings(struct net_device *dev,
-- 
2.11.0

[PATCH net-next 1/2] netvsc: fix RCU warning in get_stats

2017-04-14 Thread Stephen Hemminger

The statistics functionis called with RTNL held during probe
but with RCU held during access from /proc and elsewhere.
This is safe so update the lockdep annotation.

Signed-off-by: Stephen Hemminger 
---
 drivers/net/hyperv/netvsc_drv.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index f24c2891dd0c..76489a06ef3b 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -897,7 +897,7 @@ static void netvsc_get_stats64(struct net_device *net,
   struct rtnl_link_stats64 *t)
 {
struct net_device_context *ndev_ctx = netdev_priv(net);
-   struct netvsc_device *nvdev = rcu_dereference(ndev_ctx->nvdev);
+   struct netvsc_device *nvdev = rcu_dereference_rtnl(ndev_ctx->nvdev);
int i;
 
if (!nvdev)
-- 
2.11.0

4.10.9 nok with realtek wlan, atom

2017-04-14 Thread rupert THURNER

On Thu, Feb 9, 2017 at 9:09 PM, Larry Finger  wrote:
> On 02/09/2017 01:43 PM, Bjorn Helgaas wrote:
>>
>> [+cc rtl8192ce folks in case they've seen this]
>>
>> On Thu, Feb 09, 2017 at 03:45:01PM +0100, rupert THURNER wrote:
>>>
>>> hi,
>>>
>>> not technical expert enough, i just wanted to give a short user
>>> feedback. for realtek wlan on atom, kernels up to 4.9.5 are ok, and
>>> kernel 4.10.0-rc7-g926af6273fc6 (arch linux-git version numbering) as
>>> well. kernels 4.9.6, 4.9.7, and 4.9.8 fail, i.e. connection to a WLAN
>>> hotspot is possible then drops, or connecting to wlan fails
>>> alltogether.
>>
>>
>> Thanks very much for your report, and sorry for the inconvenience.
>>
>> v4.10-rc7 works, so I guess we don't need to worry about fixing v4.10.
>>
>> But the stable kernels v4.9.6, v4.9.7, and v4.9.8 are broken, so we
>> need to figure out why and make sure we fix the v4.9 stable series.
>>
>> I can't tell yet whether this is PCI-related or not.  If it is,
>> 4922a6a5cfa7 ("PCI: Enumerate switches below PCI-to-PCIe bridges")
>> appeared in v4.9.6, and there is a known issue with that.  The issue
>> should be fixed by 610c2b7ff8f6 ("PCI/ASPM: Handle PCI-to-PCIe bridges
>> as roots of PCIe hierarchies"), which appeared in v4.9.9, so I guess
>> the first thing to do would be to test v4.9.9.
>>
>> If it's not fixed in v4.9.9, can you share the complete dmesg log
>> (output of "dmesg" command) and "lspci -vv" output for v4.9.5 (last
>> known working version) and v4.9.6 (first known broken version)?  On
>> v4.9.6, collect the dmesg output after the failure occurs.
>>
>>> 24: PCI 300.0: 0282 WLAN controller
>>>   [Created at pci.366]
>>>   Model: "Realtek RTL8188CE 802.11b/g/n WiFi Adapter"
>>>   Device: pci 0x8176 "RTL8188CE 802.11b/g/n WiFi Adapter"
>>>   Revision: 0x01
>>>   Driver: "rtl8192ce"
>>>   Driver Modules: "rtl8192ce"
>>>   Device File: wlp3s0
>>>   Features: WLAN
>
>
> It would be helpful if someone were to bisect this issue. The only issue
> that comes to mind was fixed in commit 52f5631a4c05 ("rtlwifi: rtl8192ce:
> Fix loading of incorrect firmware") which is in 4.10-rc7 and will be
> backported to 4.9.
>
> The above issue is one that could not be reproduced on my hardware, thus it
> took Jurij Smakov to find the fix. Without his bisection of the problem, who
> knows how long it would have taken to find my edit mistake.

larry, using the newest kernel 4.10.8 the network connection dropps
again irregular.

# dmesg
[0.00] Linux version 4.10.9-1-ARCH (builduser@tobias) (gcc
version 6.3.1 20170306 (GCC) ) #1 SMP PREEMPT Sat Apr 8 12:39:59 CEST
2017
[   11.933373] rtl8192ce: rtl8192ce: Power Save off (module option)
[   11.933396] rtl8192ce: Using firmware rtlwifi/rtl8192cfw.bin
[   11.978307] ieee80211 phy0: Selected rate control algorithm 'rtl_rc'
[   11.978945] rtlwifi: rtlwifi: wireless switch is on

# lspci -vv
Subsystem: Realtek Semiconductor Co., Ltd. RTL8188CE 802.11b/g/n WiFi Adapter
Kernel driver in use: rtl8192ce

best,
rupert

Re: [PATCH net-next] bonding: do not pass link-local packets to master-interface

2017-04-14 Thread Maciej Żenczykowski

> In what case is the current behavior a problem (my guess would
> be something related to LLDP)?

LLDP is indeed the case we're trying to solve here.
Listen on one socket and get LLDP for all interfaces in the system.

>  What if, e.g., the bond is a bridge
> port, will STP frames no longer propagate to the bridge?

That's an interesting question.
I don't actually know how this should work.

Should this change perhaps only apply to packets we would otherwise
RX_HANDLER_EXACT?

ie. only affect link local packets on inactive slaves?
but continue reparenting link local packets on active slaves?

That would seem a little inconsistent... but less of a change.

Re: [PATCH block-tree] net: off by one in inet6_pton()

2017-04-14 Thread Jens Axboe

On 04/13/2017 01:42 PM, Dan Carpenter wrote:
> If "scope_len" is sizeof(scope_id) then we would put the NUL terminator
> one space beyond the end of the buffer.

Added, thanks Dan.

-- 
Jens Axboe

Re: [PATCH v3 net-next RFC] Generic XDP

2017-04-14 Thread Alexei Starovoitov

On Thu, Apr 13, 2017 at 11:38:10AM -0400, David Miller wrote:
> From: Johannes Berg 
> Date: Thu, 13 Apr 2017 08:10:56 +0200
> 
> > On Wed, 2017-04-12 at 21:20 -0700, Alexei Starovoitov wrote:
> > 
> >> > +if (skb_linearize(skb))
> >> > +goto do_drop;
> >> 
> >> when we discussed supporting jumbo frames in XDP, the idea
> >> was that the program would need to look at first 3+k bytes only
> >> and the rest of the packet will be in non-contiguous pages.
> >> If we do that, it means that XDP program would have to assume
> >> that the packet is more than [data, data_end] and this range
> >> only covers linear part.
> >> If that's the future, we don't need to linearize the skb here
> >> and can let the program access headlen only.
> > 
> > I'm not sure how you think that would work - at least with our (wifi)
> > driver, the headlen should be maybe ETH_HLEN or so at this point. We'd
> > let the program know that it can only look at so much, but then the
> > program can't do anything at all with those frames. At some point then
> > we go back to bpf_skb_load_bytes() being necessary in one form or
> > another, no?
> 
> Agreed, this is completely unusable.  Some wired ethernet drivers do the
> same exact thing.

ahh. i thought all drivers do at least copy-break (256) bytes or copy
get_headlen or build_skb the whole thing.
Since wireless does eth_hlen, then yeah, skb_linearize() is the only way.
I guess any driver that would care about XDP performance would either
implement in-driver XDP or make sure that skb_linearize() doesn't
happen in generic XDP by doing build_skb() with the whole packet.
The driver can be smart and avoid doing copy-break if generic XDP is on.

Re: [PATCH net-next] bonding: do not pass link-local packets to master-interface

2017-04-14 Thread Jay Vosburgh


Chonggang Li  wrote:

>Previously link-local packets excluding LACP (which are handled by
>the recv_probe) received on bond slave interfaces are delivered to
>stack with bond-master device with RX_HANDLER_ANOTHER, however all
>link-local packets are link specific and should be delivered with
>exact same link/dev.

In what case is the current behavior a problem (my guess would
be something related to LLDP)?  What if, e.g., the bond is a bridge
port, will STP frames no longer propagate to the bridge?

Also, I think the description would be better if it mentioned
specifically that the patch is changing how skb->dev is set for link
local frames (bond->dev vs receiving interface), e.g.,

"[...] however all link-local packets are link specific and
should be delivered with skb->dev set to the original device."

>Signed-off-by: Chonggang Li 
>Signed-off-by: Mahesh Bandewar 
>Signed-off-by: Maciej Żenczykowski 
>---
> drivers/net/bonding/bond_main.c | 3 +++
> 1 file changed, 3 insertions(+)
>
>diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>index 01e4a69af421..aeca3d8541b9 100644
>--- a/drivers/net/bonding/bond_main.c
>+++ b/drivers/net/bonding/bond_main.c
>@@ -1176,6 +1176,9 @@ static rx_handler_result_t bond_handle_frame(struct 
>sk_buff **pskb)
>   }
>   }
> 
>+  /* link-local packets should not be passed to master interface */
>+  if (is_link_local_ether_addr(eth_hdr(skb)->h_dest))
>+  return RX_HANDLER_PASS;

Since this returns _PASS and not _EXACT, the packet will go
through the ptype_base packet handlers, so is the comment strictly
correct?

-J

>   if (bond_should_deliver_exact_match(skb, slave, bond))
>   return RX_HANDLER_EXACT;
> 
>-- 
>2.12.2.762.g0e3151a226-goog
>

---
-Jay Vosburgh, jay.vosbu...@canonical.com

Re: [PATCH v3 net-next RFC] Generic XDP

2017-04-14 Thread Alexei Starovoitov

On Fri, Apr 14, 2017 at 11:05:25AM +0200, Jesper Dangaard Brouer wrote:
> > 
> > We are consistently finding that there is this real need to
> > communicate XDP capabilities, or somehow verify that the needs
> > of an XDP program can be satisfied by a given implementation.
> 
> I fully agree that we need some way to express capabilities[1]
> 
> > Maximum headroom is just one.

I don't like the idea of asking program author to explain capabilities
to the kernel. Right now the verifier already understands more about
the program than human does. If the verifier cannot deduct from the
insns what program will be doing, it's pretty much guarantee
that program author has no idea either.
If we add 'required_headroom' as an extra flag to BPF_PROG_LOAD,
the users will just pass something like 64 or 128 whereas the program
might only be doing IPIP encap and that will cause kernel to
provide extra headroom for no good reason or reject the program
whereas it could have run just fine.

So I very much agree with this part:
> ... or somehow verify that the needs
> of an XDP program can be satisfied by a given implementation.

we already have cb_access, dst_needed, xdp_adjust_head flags
that verifier discovers in the program.
For headroom we need one more. The verifier can see
the constant passed into bpf_xdp_adjust_head().
It's trickier if it's a variable, but the verifier can estimate min/max
of the variable already and worst case it can say that it
will be XDP_PACKET_HEADROOM.
If the program is doing variable length bpf_xdp_adjust_head(),
the human has no idea how much they need and cannot tell kernel anyway.

export pcie_flr and remove copies of it in drivers V2

2017-04-14 Thread Christoph Hellwig

Hi all,

this exports the PCI layer pcie_flr helper, and removes various opencoded
copies of it.

Changes since V1:
 - rebase on top of the pci/virtualization branch
 - fixed the probe case in __pci_dev_reset
 - added ACKs from Bjorn

[PATCH 2/7] PCI: call pcie_flr from reset_intel_82599_sfp_virtfn

2017-04-14 Thread Christoph Hellwig

The 82599 quirk contained an outdated copy of the FLR code.

Signed-off-by: Christoph Hellwig 
Acked-by: Bjorn Helgaas 
---
 drivers/pci/quirks.c | 16 
 1 file changed, 4 insertions(+), 12 deletions(-)

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 823271b13d12..8195ca294ee5 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -3641,19 +3641,11 @@ static int reset_intel_82599_sfp_virtfn(struct pci_dev 
*dev, int probe)
 *
 * The 82599 supports FLR on VFs, but FLR support is reported only
 * in the PF DEVCAP (sec 9.3.10.4), not in the VF DEVCAP (sec 9.5).
-* Therefore, we can't use pcie_flr(), which checks the VF DEVCAP.
+* Thus we must call pcie_flr directly without first checking if it is
+* supported.
 */
-
-   if (probe)
-   return 0;
-
-   if (!pci_wait_for_pending_transaction(dev))
-   dev_err(&dev->dev, "transaction is not cleared; proceeding with 
reset anyway\n");
-
-   pcie_capability_set_word(dev, PCI_EXP_DEVCTL, PCI_EXP_DEVCTL_BCR_FLR);
-
-   msleep(100);
-
+   if (!probe)
+   pcie_flr(dev);
return 0;
 }
 
-- 
2.11.0

[PATCH 1/7] PCI: export pcie_flr

2017-04-14 Thread Christoph Hellwig

Currently we opencode the FLR sequence in lots of place, export a core
helper instead.  We split out the probing for FLR support as all the
non-core callers already know their hardware.

Note that in the new pci_has_flr function the quirk check has been moved
before the capability check as there is no point in reading the
capability in this case.

Signed-off-by: Christoph Hellwig 
Acked-by: Bjorn Helgaas 
---
 drivers/pci/pci.c   | 39 ---
 include/linux/pci.h |  1 +
 2 files changed, 29 insertions(+), 11 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index bef14777bb30..957a11a6a840 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -3773,27 +3773,41 @@ static void pci_flr_wait(struct pci_dev *dev)
 (i - 1) * 100);
 }
 
-static int pcie_flr(struct pci_dev *dev, int probe)
+/**
+ * pcie_has_flr - check if a device supports function level resets
+ * @dev:   device to check
+ *
+ * Returns true if the device advertises support for PCIe function level
+ * resets.
+ */
+static bool pcie_has_flr(struct pci_dev *dev)
 {
u32 cap;
 
-   pcie_capability_read_dword(dev, PCI_EXP_DEVCAP, &cap);
-   if (!(cap & PCI_EXP_DEVCAP_FLR))
-   return -ENOTTY;
-
if (dev->dev_flags & PCI_DEV_FLAGS_NO_FLR_RESET)
-   return -ENOTTY;
+   return false;
 
-   if (probe)
-   return 0;
+   pcie_capability_read_dword(dev, PCI_EXP_DEVCAP, &cap);
+   return cap & PCI_EXP_DEVCAP_FLR;
+}
 
+/**
+ * pcie_flr - initiate a PCIe function level reset
+ * @dev:   device to reset
+ *
+ * Initiate a function level reset on @dev.  The caller should ensure the
+ * device supports FLR before calling this function, e.g. by using the
+ * pcie_has_flr() helper.
+ */
+void pcie_flr(struct pci_dev *dev)
+{
if (!pci_wait_for_pending_transaction(dev))
dev_err(&dev->dev, "timed out waiting for pending transaction; 
performing function level reset anyway\n");
 
pcie_capability_set_word(dev, PCI_EXP_DEVCTL, PCI_EXP_DEVCTL_BCR_FLR);
pci_flr_wait(dev);
-   return 0;
 }
+EXPORT_SYMBOL_GPL(pcie_flr);
 
 static int pci_af_flr(struct pci_dev *dev, int probe)
 {
@@ -3977,9 +3991,12 @@ static int __pci_dev_reset(struct pci_dev *dev, int 
probe)
if (rc != -ENOTTY)
goto done;
 
-   rc = pcie_flr(dev, probe);
-   if (rc != -ENOTTY)
+   if (pcie_has_flr(dev)) {
+   if (!probe)
+   pcie_flr(dev);
+   rc = 0;
goto done;
+   }
 
rc = pci_af_flr(dev, probe);
if (rc != -ENOTTY)
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 20e1865233a4..60162f51227a 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1054,6 +1054,7 @@ int pcie_get_mps(struct pci_dev *dev);
 int pcie_set_mps(struct pci_dev *dev, int mps);
 int pcie_get_minimum_link(struct pci_dev *dev, enum pci_bus_speed *speed,
  enum pcie_link_width *width);
+void pcie_flr(struct pci_dev *dev);
 int __pci_reset_function(struct pci_dev *dev);
 int __pci_reset_function_locked(struct pci_dev *dev);
 int pci_reset_function(struct pci_dev *dev);
-- 
2.11.0

Re: [PATCH -next] net: phy: test the right variable in phy_write_mmd()

2017-04-14 Thread Andrew Lunn

On Fri, Apr 14, 2017 at 10:10:41PM +0300, Dan Carpenter wrote:
> This is a copy and paste buglet.  We meant to test for ->write_mmd but
> we test for ->read_mmd.
> 
> Fixes: 1ee6b9bc6206 ("net: phy: make phy_(read|write)_mmd() generic MMD 
> accessors")
> Signed-off-by: Dan Carpenter 

Reviewed-by: Andrew Lunn 

Andrew

[no subject]

2017-04-14 Thread David Miller


Things seem to be settling down as far as networking is concerned,
let's hope this trend continues...

1) Add iov_iter_revert() and use it to fix the behavior of
   skb_copy_datagram_msg() et al., from Al Viro.

2) Fix the protocol used in the synthetic SKB we cons up for the
   purposes of doing a simulated route lookup for RTM_GETROUTE
   requests.  From Florian Larysch.

3) Don't add noop_qdisc to the per-device qdisc hashes, from Cong
   Wang.

4) Don't call netdev_change_features with the team lock held, from Xin
   Long.

5) Revert TCP F-RTO extension to catch more spurious timeouts because it
   interacts very badly with some middle-boxes.  From Yuchung Cheng.

6) Fix the loss of error values in l2tp {s,g}etsockopt calls, from
   Guillaume Nault.

7) ctnetlink uses bit positions where it should be using bit masks,
   fix from Liping Zhang.

8) Missing RCU locking in netfilter helper code, from Gao Feng.

9) Avoid double frees and use-after-frees in tcp_disconnect(), from
   Eric Dumazet.

10) Don't do a changelink before we register the netdevice in bridging,
from Ido Schimmel.

11) Lock the ipv6 device address list properly, from Rabin Vincent.

Please pull, thanks a lot!

The following changes since commit ea6b1720ce25f92f7a17b2e0c2b653d20773d10a:

  Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net (2017-04-05 
20:17:38 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git 

for you to fetch changes up to f4c13c8ec56e70eeff3e365e0c5fcdad16845b32:

  Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf (2017-04-14 
10:47:13 -0400)


Al Viro (2):
  [iov_iter] new privimitive: iov_iter_revert()
  make skb_copy_datagram_msg() et.al. preserve ->msg_iter on error

Daniele Palmas (1):
  drivers: net: usb: qmi_wwan: add QMI_QUIRK_SET_DTR for Telit PID 0x1201

David S. Miller (5):
  Merge branch 'for-davem' of git://git.kernel.org/.../viro/vfs
  Merge branch 'l2tp-sockopt-errors'
  Merge tag 'linux-can-fixes-for-4.12-20170404' of 
git://git.kernel.org/.../mkl/linux-can
  Merge branch 'bridge-register-netdev-before-changelink'
  Merge git://git.kernel.org/.../pablo/nf

Eric Dumazet (2):
  netfilter: xt_TCPMSS: add more sanity tests on tcph->doff
  tcp: clear saved_syn in tcp_disconnect()

Florian Larysch (1):
  net: ipv4: fix multipath RTM_GETROUTE behavior when iif is given

Gao Feng (3):
  net: tcp: Increase TCP_MIB_OUTRSTS even though fail to alloc skb
  netfilter: helper: Add the rcu lock when call __nf_conntrack_helper_find
  netfilter: ipt_CLUSTERIP: Fix wrong conntrack netns refcnt usage

Geert Uytterhoeven (1):
  can: rcar_can: Do not print virtual addresses

Guillaume Nault (2):
  l2tp: don't mask errors in pppol2tp_setsockopt()
  l2tp: don't mask errors in pppol2tp_getsockopt()

Ido Schimmel (2):
  bridge: implement missing ndo_uninit()
  bridge: netlink: register netdevice before executing changelink

Johannes Berg (2):
  bpf: reference may_access_skb() from __bpf_prog_run()
  net: xdp: don't export dev_change_xdp_fd()

Liping Zhang (6):
  netfilter: ctnetlink: using bit to represent the ct event
  netfilter: ctnetlink: make it safer when checking the ct helper name
  netfilter: make it safer during the inet6_dev->addr_list traversal
  netfilter: ctnetlink: skip dumping expect when nfct_help(ct) is NULL
  netfilter: nf_ct_expect: use proper RCU list traversal/update APIs
  netfilter: nft_hash: do not dump the auto generated seed

Markus Marb (1):
  can: ifi: use correct register to read rx status

Oliver Neukum (1):
  usbnet: make sure no NULL pointer is passed through

Rabin Vincent (1):
  ipv6: Fix idev->addr_list corruption

WANG Cong (1):
  net_sched: check noop_qdisc before qdisc_hash_add()

Xin Long (2):
  sctp: listen on the sock only when it's state is listening or closed
  team: call netdev_change_features out of team lock

Yuchung Cheng (1):
  tcp: restrict F-RTO to work-around broken middle-boxes

 drivers/net/can/ifi_canfd/ifi_canfd.c |  2 +-
 drivers/net/can/rcar/rcar_can.c   |  3 +--
 drivers/net/team/team.c   | 19 +++
 drivers/net/usb/qmi_wwan.c|  2 +-
 drivers/net/usb/usbnet.c  | 19 +++
 include/linux/uio.h   |  6 +-
 kernel/bpf/core.c | 12 ++--
 lib/iov_iter.c| 63 
+++
 net/bridge/br_device.c| 20 +++-
 net/bridge/br_if.c|  1 -
 net/bridge/br_multicast.c |  7 +--
 net/bridge/br_netlink.c   |  7 +--
 net/bridge/br_private.h   |  5 +
 net/core/datagram.c   | 23 ++-
 net/c

[PATCH 3/7] PCI: call pcie_flr from reset_chelsio_generic_dev

2017-04-14 Thread Christoph Hellwig

Instead of copy & pasting and old version of the code.

Signed-off-by: Christoph Hellwig 
Acked-by: Bjorn Helgaas 
---
 drivers/pci/quirks.c | 15 +--
 1 file changed, 1 insertion(+), 14 deletions(-)

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 8195ca294ee5..715ed8c08fa3 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -3750,20 +3750,7 @@ static int reset_chelsio_generic_dev(struct pci_dev 
*dev, int probe)
  PCI_MSIX_FLAGS_ENABLE |
  PCI_MSIX_FLAGS_MASKALL);
 
-   /*
-* Start of pcie_flr() code sequence.  This reset code is a copy of
-* the guts of pcie_flr() because that's not an exported function.
-*/
-
-   if (!pci_wait_for_pending_transaction(dev))
-   dev_err(&dev->dev, "transaction is not cleared; proceeding with 
reset anyway\n");
-
-   pcie_capability_set_word(dev, PCI_EXP_DEVCTL, PCI_EXP_DEVCTL_BCR_FLR);
-   msleep(100);
-
-   /*
-* End of pcie_flr() code sequence.
-*/
+   pcie_flr(dev);
 
/*
 * Restore the configuration information (BAR values, etc.) including
-- 
2.11.0

[PATCH 6/7] crypto: qat: use pcie_flr instead of duplicating it

2017-04-14 Thread Christoph Hellwig

Signed-off-by: Christoph Hellwig 
---
 drivers/crypto/qat/qat_common/adf_aer.c | 15 +--
 1 file changed, 1 insertion(+), 14 deletions(-)

diff --git a/drivers/crypto/qat/qat_common/adf_aer.c 
b/drivers/crypto/qat/qat_common/adf_aer.c
index 2839fccdd84b..d3e25c37dc33 100644
--- a/drivers/crypto/qat/qat_common/adf_aer.c
+++ b/drivers/crypto/qat/qat_common/adf_aer.c
@@ -109,20 +109,7 @@ EXPORT_SYMBOL_GPL(adf_reset_sbr);
 
 void adf_reset_flr(struct adf_accel_dev *accel_dev)
 {
-   struct pci_dev *pdev = accel_to_pci_dev(accel_dev);
-   u16 control = 0;
-   int pos = 0;
-
-   dev_info(&GET_DEV(accel_dev), "Function level reset\n");
-   pos = pci_pcie_cap(pdev);
-   if (!pos) {
-   dev_err(&GET_DEV(accel_dev), "Restart device failed\n");
-   return;
-   }
-   pci_read_config_word(pdev, pos + PCI_EXP_DEVCTL, &control);
-   control |= PCI_EXP_DEVCTL_BCR_FLR;
-   pci_write_config_word(pdev, pos + PCI_EXP_DEVCTL, control);
-   msleep(100);
+   pcie_flr(accel_to_pci_dev(accel_dev));
 }
 EXPORT_SYMBOL_GPL(adf_reset_flr);
 
-- 
2.11.0

[PATCH 4/7] ixgbe: use pcie_flr instead of duplicating it

2017-04-14 Thread Christoph Hellwig

Signed-off-by: Christoph Hellwig 
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 16 ++--
 1 file changed, 2 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index a7a430a7be2c..543ddde5f8e2 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -7112,18 +7112,6 @@ static void ixgbe_watchdog_flush_tx(struct ixgbe_adapter 
*adapter)
 }
 
 #ifdef CONFIG_PCI_IOV
-static inline void ixgbe_issue_vf_flr(struct ixgbe_adapter *adapter,
- struct pci_dev *vfdev)
-{
-   if (!pci_wait_for_pending_transaction(vfdev))
-   e_dev_warn("Issuing VFLR with pending transactions\n");
-
-   e_dev_err("Issuing VFLR for VF %s\n", pci_name(vfdev));
-   pcie_capability_set_word(vfdev, PCI_EXP_DEVCTL, PCI_EXP_DEVCTL_BCR_FLR);
-
-   msleep(100);
-}
-
 static void ixgbe_check_for_bad_vf(struct ixgbe_adapter *adapter)
 {
struct ixgbe_hw *hw = &adapter->hw;
@@ -7156,7 +7144,7 @@ static void ixgbe_check_for_bad_vf(struct ixgbe_adapter 
*adapter)
pci_read_config_word(vfdev, PCI_STATUS, &status_reg);
if (status_reg != IXGBE_FAILED_READ_CFG_WORD &&
status_reg & PCI_STATUS_REC_MASTER_ABORT)
-   ixgbe_issue_vf_flr(adapter, vfdev);
+   pcie_flr(vfdev);
}
 }
 
@@ -10244,7 +10232,7 @@ static pci_ers_result_t ixgbe_io_error_detected(struct 
pci_dev *pdev,
 * VFLR.  Just clean up the AER in that case.
 */
if (vfdev) {
-   ixgbe_issue_vf_flr(adapter, vfdev);
+   pcie_flr(vfdev);
/* Free device reference count */
pci_dev_put(vfdev);
}
-- 
2.11.0

[PATCH 5/7] IB/hfi1: use pcie_flr instead of duplicating it

2017-04-14 Thread Christoph Hellwig

Signed-off-by: Christoph Hellwig 
---
 drivers/infiniband/hw/hfi1/chip.c |  4 ++--
 drivers/infiniband/hw/hfi1/hfi.h  |  1 -
 drivers/infiniband/hw/hfi1/pcie.c | 30 --
 3 files changed, 2 insertions(+), 33 deletions(-)

diff --git a/drivers/infiniband/hw/hfi1/chip.c 
b/drivers/infiniband/hw/hfi1/chip.c
index 121a4c920f1b..d037f72e4d96 100644
--- a/drivers/infiniband/hw/hfi1/chip.c
+++ b/drivers/infiniband/hw/hfi1/chip.c
@@ -13610,14 +13610,14 @@ static void init_chip(struct hfi1_devdata *dd)
dd_dev_info(dd, "Resetting CSRs with FLR\n");
 
/* do the FLR, the DC reset will remain */
-   hfi1_pcie_flr(dd);
+   pcie_flr(dd->pcidev);
 
/* restore command and BARs */
restore_pci_variables(dd);
 
if (is_ax(dd)) {
dd_dev_info(dd, "Resetting CSRs with FLR\n");
-   hfi1_pcie_flr(dd);
+   pcie_flr(dd->pcidev);
restore_pci_variables(dd);
}
} else {
diff --git a/drivers/infiniband/hw/hfi1/hfi.h b/drivers/infiniband/hw/hfi1/hfi.h
index 0808e3c3ba39..40d7559fa723 100644
--- a/drivers/infiniband/hw/hfi1/hfi.h
+++ b/drivers/infiniband/hw/hfi1/hfi.h
@@ -1764,7 +1764,6 @@ int hfi1_pcie_init(struct pci_dev *, const struct 
pci_device_id *);
 void hfi1_pcie_cleanup(struct pci_dev *);
 int hfi1_pcie_ddinit(struct hfi1_devdata *, struct pci_dev *);
 void hfi1_pcie_ddcleanup(struct hfi1_devdata *);
-void hfi1_pcie_flr(struct hfi1_devdata *);
 int pcie_speeds(struct hfi1_devdata *);
 void request_msix(struct hfi1_devdata *, u32 *, struct hfi1_msix_entry *);
 void hfi1_enable_intx(struct pci_dev *);
diff --git a/drivers/infiniband/hw/hfi1/pcie.c 
b/drivers/infiniband/hw/hfi1/pcie.c
index 0829fce06172..c81556e84831 100644
--- a/drivers/infiniband/hw/hfi1/pcie.c
+++ b/drivers/infiniband/hw/hfi1/pcie.c
@@ -240,36 +240,6 @@ void hfi1_pcie_ddcleanup(struct hfi1_devdata *dd)
iounmap(dd->piobase);
 }
 
-/*
- * Do a Function Level Reset (FLR) on the device.
- * Based on static function drivers/pci/pci.c:pcie_flr().
- */
-void hfi1_pcie_flr(struct hfi1_devdata *dd)
-{
-   int i;
-   u16 status;
-
-   /* no need to check for the capability - we know the device has it */
-
-   /* wait for Transaction Pending bit to clear, at most a few ms */
-   for (i = 0; i < 4; i++) {
-   if (i)
-   msleep((1 << (i - 1)) * 100);
-
-   pcie_capability_read_word(dd->pcidev, PCI_EXP_DEVSTA, &status);
-   if (!(status & PCI_EXP_DEVSTA_TRPND))
-   goto clear;
-   }
-
-   dd_dev_err(dd, "Transaction Pending bit is not clearing, proceeding 
with reset anyway\n");
-
-clear:
-   pcie_capability_set_word(dd->pcidev, PCI_EXP_DEVCTL,
-PCI_EXP_DEVCTL_BCR_FLR);
-   /* PCIe spec requires the function to be back within 100ms */
-   msleep(100);
-}
-
 static void msix_setup(struct hfi1_devdata *dd, int pos, u32 *msixcnt,
   struct hfi1_msix_entry *hfi1_msix_entry)
 {
-- 
2.11.0

[PATCH 7/7] liquidio: use pcie_flr instead of duplicating it

2017-04-14 Thread Christoph Hellwig

Signed-off-by: Christoph Hellwig 
---
 drivers/net/ethernet/cavium/liquidio/lio_vf_main.c | 15 +--
 1 file changed, 1 insertion(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c 
b/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
index 9d5e03502c76..afdbf7fa016e 100644
--- a/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
+++ b/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
@@ -869,20 +869,7 @@ static void octeon_pci_flr(struct octeon_device *oct)
pci_write_config_word(oct->pci_dev, PCI_COMMAND,
  PCI_COMMAND_INTX_DISABLE);
 
-   /* Wait for Transaction Pending bit clean */
-   msleep(100);
-   pcie_capability_read_word(oct->pci_dev, PCI_EXP_DEVSTA, &status);
-   if (status & PCI_EXP_DEVSTA_TRPND) {
-   dev_info(&oct->pci_dev->dev, "Function reset incomplete after 
100ms, sleeping for 5 seconds\n");
-   ssleep(5);
-   pcie_capability_read_word(oct->pci_dev, PCI_EXP_DEVSTA,
- &status);
-   if (status & PCI_EXP_DEVSTA_TRPND)
-   dev_info(&oct->pci_dev->dev, "Function reset still 
incomplete after 5s, reset anyway\n");
-   }
-   pcie_capability_set_word(oct->pci_dev, PCI_EXP_DEVCTL,
-PCI_EXP_DEVCTL_BCR_FLR);
-   mdelay(100);
+   pcie_flr(oct->pci_dev);
 
pci_cfg_access_unlock(oct->pci_dev);
 
-- 
2.11.0

[PATCH -next] net: phy: test the right variable in phy_write_mmd()

2017-04-14 Thread Dan Carpenter

This is a copy and paste buglet.  We meant to test for ->write_mmd but
we test for ->read_mmd.

Fixes: 1ee6b9bc6206 ("net: phy: make phy_(read|write)_mmd() generic MMD 
accessors")
Signed-off-by: Dan Carpenter 

diff --git a/drivers/net/phy/phy-core.c b/drivers/net/phy/phy-core.c
index 357a4d0d7641..6739b738bbaf 100644
--- a/drivers/net/phy/phy-core.c
+++ b/drivers/net/phy/phy-core.c
@@ -76,7 +76,7 @@ int phy_write_mmd(struct phy_device *phydev, int devad, u32 
regnum, u16 val)
if (regnum > (u16)~0 || devad > 32)
return -EINVAL;
 
-   if (phydev->drv->read_mmd) {
+   if (phydev->drv->write_mmd) {
ret = phydev->drv->write_mmd(phydev, devad, regnum, val);
} else if (phydev->is_c45) {
u32 addr = MII_ADDR_C45 | (devad << 16) | (regnum & 0x);

Re: [PATCH net] ipv6: drop non loopback packets claiming to originate from ::1

2017-04-14 Thread Eric Dumazet

On Fri, 2017-04-14 at 20:22 +0200, Florian Westphal wrote:
> We lack a saddr check for ::1. This causes security issues e.g. with acls
> permitting connections from ::1 because of assumption that these originate
> from local machine.
> 
> Assuming a source address of ::1 is local seems reasonable.
> RFC4291 doesn't allow such a source address either, so drop such packets.
> 
> Reported-by: Eric Dumazet 
> Signed-off-by: Florian Westphal 
> ---
>  net/ipv6/ip6_input.c | 7 +--
>  1 file changed, 5 insertions(+), 2 deletions(-)

Acked-by: Eric Dumazet

Re: [PATCH v3 net-next RFC] Generic XDP

2017-04-14 Thread Alexei Starovoitov

On Fri, Apr 14, 2017 at 10:07:40AM +0200, Johannes Berg wrote:
> On Thu, 2017-04-13 at 16:01 -0400, David Miller wrote:
> > From: Johannes Berg 
> > Date: Thu, 13 Apr 2017 21:22:21 +0200
> > 
> > > OTOH, it might depend on the frame data itself, if the program does
> > > something like
> > > 
> > > xdp->data[xdp->data[0] & 0xf]
> > > 
> > > (read or write, doesn't really matter) so then the verifier would
> > have
> > > to take the maximum possible value there into account.
> > 
> > I am not well versed enough with the verifier to understand exactly
> > how and to what extent SKB accesses are validated by the verifier.
> > 
> > My, perhaps mistaken, impression is that access range validation is
> > still at least partially done at run time.
> 
> I think you're right for SKB accesses, but I'm pretty sure that for XDP
> the verifier checks that the program can't possibly access outside of
> [xdp->data, xdp->data_end], see compare_ptrs_to_packet().
> 
> This checks that comparisons to data_end are all there, i.e. that the
> program verifies it may access some bytes before actually doing so.
> However, the program could start with
> 
> if (xdp->data_end < xdp->data + 1024)
>   return DROP;
> [...]
> 
> and then the verifier would consider it safe.
> 
> Still, it has to track down into the [...] code to actually understand
> that it now didn't try to access xdp->data+1025, and as such it should
> be able to determine the maximum desired offset.
> 
> However, I'm coming to realize that may not necessarily mean that the
> program really *needs* to access that data.
> 
> For example, a hypothetical wifi program might want to recalculate and
> compare the CRC checksum (for whatever reason, say a driver/hardware
> bug). This would require accessing the last 4 bytes of the packet,
> which may not be present. The program could, however, accept that
> sometimes this isn't possible, and simply accept frames when it can't
> see the last 4 bytes (or if the last 4 bytes aren't the CRC because
> that isn't reported now, but whatever, I'm handwaving anyway.)
> 
> So perhaps this isn't really a good idea. The program should probably
> separately say how much data it really *needs* there, and then perhaps
> a warning could be emitted if it never accesses the data that it
> advertises as needing (i.e. if it says "I want 1024 bytes" but then
> can't possibly read more than 14)

The above descrption is correct. Also note that the offset within the packet
can be a variable as well, since that's the only way to do variable
length parsing (like ip options), so the verifier cannot tell how far
into the packet the program is going to parse. The program author cannot
tell that either!
Consider samples/bpf/parse_varlen.c:
ihl_len = iph->ihl * 4;

if (iph->protocol == IPPROTO_IPIP) {
iph = data + nh_off + ihl_len;
if (iph + 1 > data_end)
return 0;

'ihl' is 4 bit, so 2nd ip hdr may start at sizeof(eth) + 64 bytes,
but we cannot ask the driver to always provide these many bytes.
Technically we can add a feature to verifier to calculate the max range
into the packet that the program may look at, but it will be large.
Like if the program parses IPIP the range will be 64(first ip) + 64(second ip)
+ 64 (if it looks at tcp options).
If the program parses TLVs then the range will be 64k, so I'm not sure what
would be the value of such verifier feature.

Re: [RFC 1/3] bpf/wireless: add wifimon program type

2017-04-14 Thread Alexei Starovoitov

On Wed, Apr 12, 2017 at 05:30:40PM +0200, Johannes Berg wrote:
> On Wed, 2017-04-12 at 11:19 -0400, David Miller wrote:
> > 
> > > Instead it may make more sense to just have a "wifi_info(skb,
> > info)"
> > > function you can call, e.g. with a structure that has various
> > fields
> > > and flags to see which you care about.
> > 
> > I would advise against this, let the parsing part use __sk_buff and
> > therefore have maximum flexibility.  You really cannot predict the
> > future, so in my opinion it might be unwise to constrain at this
> > point.
> 
> So my point with the wifi_info() function to call from the BPF program
> was just that putting something that's not already in struct sk_buff
> into __sk_buff doesn't really make a lot of sense - we still have to
> actually build it somewhere/somehow without knowing if it's going to be
> needed by the program. We already have things like prog->cb_access and
> prog->dst_needed now, but I'm not sure we want to blow that up much.
> 
> At the same time, technically I *do* have the information in skb->cb,
> but the format thereof is something I really wouldn't want to let
> trickle into the ABI. Therefore, I think if somebody needs something
> like the bitrate, we should have a wifi_info() function that can return
> the bitrate (among other things) without having to pre-build the data.
> We can still cache it in mac80211 if multiple programs are there, dunno
> if that makes sense.
> 
> Nevertheless, I think if I don't use __sk_buff as the program argument
> then things get really messy, so I'll stick to that, and avoid adding
> anything to it as much as possible.

so today only 'len' field is needed, but the plan to add wifi specific
stuff to bpf context?
If so, adding these very wifi specific fields to __sk_buff will confuse
other program types, so new program type (like you did) and new
'struct bpf_wifi_md' are probably cleaner.
Physically the R1 register to bpf program will still be 'struct sk_buff',
but from bpf program side it will be:
struct bpf_wifi_md {
  __u32 len;
  __u32 wifi_things;
};

At the same time if most of the __sk_buff fields can be useful to wifimon,
then just use it as-is (without restricting to 'len' only) and add wifi
specific fields to it with a comment.

Re: eBPF - little-endian load instructions?

2017-04-14 Thread Alexei Starovoitov

On Thu, Apr 13, 2017 at 07:58:45AM +0200, Johannes Berg wrote:
> On Wed, 2017-04-12 at 20:08 -0700, Alexei Starovoitov wrote:
> 
> > it's really llvm bug that i need fix. It's plain broken
> > to generate what effectively is nop insn for march=bpfeb
> > My only excuse that when that code was written llvm had only
> > march=bpfel.
> > bpfeb was added much later.
> 
> So I'm confused now. Is bpf intended to be endian-independent or not?
> It sounded at first like it was, even if I have a hard time imagining
> how that would even work.

bpf takes endianness of the cpu it runs on.
if we said that bpf 32-bit load insn is little endian, it would have screwed up
all big-endian archs and the other way around.
For both classic and extended the corresponding struct sock_filter and bpf_insn
are encoded in target cpu endianness.
The front-end (clang) needs to run in target endianness as well otherwise
the bcc tracing scripts will be accessing garbage.
In other words you cannot take either classic or extended bpf program
generated for x86 and run it on big-endian. Though you can compile on x86
and run on Arm. Hence two targets for llvm -march=bpfeb and -march=bpfel
If you compile for -march=bpfeb and try to load on x86, the instruction stream
will be rejected by the verifier in early stages, since branch insns and
all immediate values will be wrong.

Re: [PATCH 3/4] net: macb: Add hardware PTP support

2017-04-14 Thread Richard Cochran

On Thu, Apr 13, 2017 at 02:39:23PM +0100, Rafal Ozieblo wrote:
> This patch is based on original Harini's patch and Andrei's patch,
> implemented in a separate file to ease the review/maintanance
> and integration with other platforms.

Please see if you can break this patch into 2 parts:

1. SO_TIMESTAMPING
2. PHC support
 
> This driver does support GEM-GXL:

"This driver supports GEM-GXL:"

> - HW time stamp on the PTP Ethernet packets are received using the
>   SO_TIMESTAMPING API. Where timers are obtained from the dma buffer
>   descriptors

This text is poor.  No "timers" are obtained but rather time stamps.
Also, second sentence is not a sentence.  (An english sentence has a
noun and a verb.)

> diff --git a/drivers/net/ethernet/cadence/macb.c 
> b/drivers/net/ethernet/cadence/macb.c
> index 59d459b..603bac1 100644
> --- a/drivers/net/ethernet/cadence/macb.c
> +++ b/drivers/net/ethernet/cadence/macb.c
> @@ -826,6 +826,15 @@ static void macb_tx_interrupt(struct macb_queue *queue)
>  
>   /* First, update TX stats if needed */
>   if (skb) {
> +#ifdef CONFIG_MACB_USE_HWSTAMP

No need for ifdef here.  Instead, let gem_ptp_do_txstamp() return -1.

> + if (gem_ptp_do_txstamp(queue, skb, desc) == 0) {
> + /* skb now belongs to timestamp buffer
> +  * and will be removed later
> +  */
> + tx_skb->skb = NULL;
> + schedule_work(&queue->tx_ts_task);
> + }
> +#endif
>   netdev_vdbg(bp->dev, "skb %u (data %p) TX 
> complete\n",
>   macb_tx_ring_wrap(bp, tail),
>   skb->data);
> @@ -992,6 +1001,10 @@ static int gem_rx(struct macb *bp, int budget)
>   bp->dev->stats.rx_packets++;
>   bp->dev->stats.rx_bytes += skb->len;
>  
> +#ifdef CONFIG_MACB_USE_HWSTAMP

No ifdef needed.

> + gem_ptp_do_rxstamp(bp, skb, desc);
> +#endif
> +
>  #if defined(DEBUG) && defined(VERBOSE_DEBUG)
>   netdev_vdbg(bp->dev, "received skb of length %u, csum: %08x\n",
>   skb->len, skb->csum);
> @@ -1314,6 +1327,11 @@ static irqreturn_t macb_interrupt(int irq, void 
> *dev_id)
>   queue_writel(queue, ISR, MACB_BIT(HRESP));
>   }
>  
> +#ifdef CONFIG_MACB_USE_HWSTAMP
> + if (status & MACB_PTP_INT_MASK)

Can't you use IS_ENABLED(CONFIG_MACB_USE_HWSTAMP) here?

> + macb_ptp_int(queue, status);
> +#endif
> +
>   status = queue_readl(queue, ISR);
>   }
>  
> @@ -1643,8 +1661,10 @@ static int macb_start_xmit(struct sk_buff *skb, struct 
> net_device *dev)
>  
>   /* Make newly initialized descriptor visible to hardware */
>   wmb();
> -
> - skb_tx_timestamp(skb);
> +#ifdef CONFIG_MACB_USE_HWSTAMP
> + if (!bp->ptp_hw_support)
> +#endif
> + skb_tx_timestamp(skb);

This is wrong.  You should call skb_tx_timestamp() unconditionally,
but be sure to set SKBTX_IN_PROGRESS when appropriate.

> diff --git a/drivers/net/ethernet/cadence/macb.h 
> b/drivers/net/ethernet/cadence/macb.h
> index 2606970..5295045 100644
> --- a/drivers/net/ethernet/cadence/macb.h
> +++ b/drivers/net/ethernet/cadence/macb.h
> @@ -11,6 +11,9 @@
>  #define _MACB_H
>  
>  #include 
> +#include 

You don't need to include ptp_clock.h.

> +#include 
> +#include 
>  
>  #if defined(CONFIG_ARCH_DMA_ADDR_T_64BIT) || defined(CONFIG_MACB_USE_HWSTAMP)
>  #define MACB_EXT_DESC

...

> @@ -527,6 +595,8 @@
>  #define queue_readl(queue, reg)  
> (queue)->bp->macb_reg_readl((queue)->bp, (queue)->reg)
>  #define queue_writel(queue, reg, value)  
> (queue)->bp->macb_reg_writel((queue)->bp, (queue)->reg, (value))
>  
> +#define PTP_TS_BUFFER_SIZE   128 /* must be power of 2 */
> +
>  /* Conditional GEM/MACB macros.  These perform the operation to the correct
>   * register dependent on whether the device is a GEM or a MACB.  For 
> registers
>   * and bitfields that are common across both devices, use macb_{read,write}l
> @@ -889,6 +959,18 @@ struct macb_config {
>   int jumbo_max_len;
>  };
>  
> +#ifdef CONFIG_MACB_USE_HWSTAMP

No need for ifdef here.

> +struct tsu_incr {
> + u32 sub_ns;
> + u32 ns;
> +};
> +
> +struct gem_tx_ts {
> + struct sk_buff *skb;
> + struct macb_dma_desc_ptp desc_ptp;
> +};
> +#endif
> +
>  struct macb_queue {
>   struct macb *bp;
>   int irq;

...

> diff --git a/drivers/net/ethernet/cadence/macb_ptp.c 
> b/drivers/net/ethernet/cadence/macb_ptp.c
> new file mode 100755
> index 000..72a79c4
> --- /dev/null
> +++ b/drivers/net/ethernet/cadence/macb_ptp.c
> @@ -0,0 +1,724 @@
> +/**
> + * 1588 PTP support for Cadence G

Re: [Intel-wired-lan] [next-queue v6 PATCH 5/7] i40e: Add TX and RX support over port netdev's in switchdev mode

2017-04-14 Thread Samudrala, Sridhar




On 4/14/2017 9:47 AM, Alexander Duyck wrote:

On Wed, Mar 29, 2017 at 5:22 PM, Sridhar Samudrala
 wrote:

In switchdev mode, broadcasts from VFs are received by the PF and passed
to corresponding port representor netdev.
Any frames sent via port netdevs are sent as directed transmits to the
corresponding VFs. To enable directed transmit, skb metadata dst is used
to pass the port id and the frame is requeued to call the PFs transmit
routine. VF id is used as port id for VFs and PF port id is defined as
I40_MAIN_VSI_PORT_ID.

Small script to demonstrate inter VF and PF to VF pings in switchdev mode.
PF: p4p1, VFs: p4p1_0,p4p1_1 VF Port Reps:p4p1-vf0, p4p1-vf1
PF Port rep: p4p1-pf

# rmmod i40e; modprobe i40e
# devlink dev eswitch set pci/:05:00.0 mode switchdev
# echo 2 > /sys/class/net/p4p1/device/sriov_numvfs
# ip link set p4p1 vf 0 mac 00:11:22:33:44:55
# ip link set p4p1 vf 1 mac 00:11:22:33:44:56
# rmmod i40evf; modprobe i40evf

/* Create 2 namespaces and move the VFs to the corresponding ns */
# ip netns add ns0
# ip link set p4p1_0 netns ns0
# ip netns exec ns0 ip addr add 192.168.1.10/24 dev p4p1_0
# ip netns exec ns0 ip link set p4p1_0 up
# ip netns add ns1
# ip link set p4p1_1 netns ns1
# ip netns exec ns1 ip addr add 192.168.1.11/24 dev p4p1_1
# ip netns exec ns1 ip link set p4p1_1 up

/* bring up pf and port netdevs */
# ip addr add 192.168.1.1/24 dev p4p1
# ip link set p4p1 up
# ip link set p4p1-vf0 up
# ip link set p4p1-vf1 up
# ip link set p4p1-pf up

# ip netns exec ns0 ping -c3 192.168.1.11  /* VF0 -> VF1 */
# ip netns exec ns1 ping -c3 192.168.1.10  /* VF1 -> VF0 */
# ping -c3 192.168.1.10   /* PF -> VF0 */
# ping -c3 192.168.1.11   /* PF -> VF1 */

/* VF0 -> IP in same subnet - broadcasts will be seen on p4p1-vf0 & p4p1 */
# ip netns exec ns0 ping -c1 -W1 192.168.1.200
/* VF1 -> IP in same subnet -  broadcasts will be seen on p4p1-vf1 & p4p1*/
# ip netns exec ns0 ping -c1 -W1 192.168.1.200
/* port rep VF0 -> IP in same subnet - broadcasts will be seen on p4p1_0 */
# ping -I p4p1-vf0 -c1 -W1 192.168.1.200
/* port rep VF1 -> IP in same subnet  - broadcasts will be seen on p4p1_1 */
# ping -I p4p1-vf1 -c1 -W1 192.168.1.200

Signed-off-by: Sridhar Samudrala 
---
  drivers/net/ethernet/intel/i40e/i40e.h |   4 +
  drivers/net/ethernet/intel/i40e/i40e_main.c|  27 +++-
  drivers/net/ethernet/intel/i40e/i40e_txrx.c| 148 -
  drivers/net/ethernet/intel/i40e/i40e_txrx.h|   2 +
  drivers/net/ethernet/intel/i40e/i40e_type.h|   3 +
  drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c |   8 +-
  6 files changed, 184 insertions(+), 8 deletions(-)





diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c 
b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index ebffca0..86d2510 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -1302,20 +1302,64 @@ static bool i40e_alloc_mapped_page(struct i40e_ring 
*rx_ring,
  }

  /**
+ * i40e_handle_lpbk_skb - Update skb->dev of a loopback frame
+ * @rx_ring: rx ring in play
+ * @skb: packet to send up
+ **/
+static void i40e_handle_lpbk_skb(struct i40e_ring *rx_ring, struct sk_buff 
*skb)
+{
+   struct i40e_q_vector *q_vector = rx_ring->q_vector;
+   struct i40e_pf *pf = rx_ring->vsi->back;
+   struct sk_buff *nskb;
+   struct i40e_vf *vf;
+   struct ethhdr *eth;
+   int vf_id;
+
+   if ((skb->pkt_type != PACKET_BROADCAST) &&
+   (skb->pkt_type != PACKET_MULTICAST) &&
+   (skb->pkt_type != PACKET_OTHERHOST))
+   return;
+
+   eth = (struct ethhdr *)skb_mac_header(skb);
+
+   /* If a loopback packet is received in switchdev mode, clone the skb
+* and pass it to the corresponding port netdev based on the source MAC.
+*/
+   for (vf_id = 0; vf_id < pf->num_alloc_vfs; vf_id++) {
+   vf = &pf->vf[vf_id];
+   if (ether_addr_equal(eth->h_source,
+vf->default_lan_addr.addr)) {
+   nskb = skb_clone(skb, GFP_ATOMIC);
+   if (!nskb)
+   break;
+   nskb->offload_fwd_mark = 1;

So this line is causing build errors when switchdev is not enabled.
This whole function should probably be wrapped in a check to see if
switchdev support is enabled or not.

Yes. will fix it in the next revision.

Thanks
Sridhar

[PATCH net] ipv6: drop non loopback packets claiming to originate from ::1

2017-04-14 Thread Florian Westphal

We lack a saddr check for ::1. This causes security issues e.g. with acls
permitting connections from ::1 because of assumption that these originate
from local machine.

Assuming a source address of ::1 is local seems reasonable.
RFC4291 doesn't allow such a source address either, so drop such packets.

Reported-by: Eric Dumazet 
Signed-off-by: Florian Westphal 
---
 net/ipv6/ip6_input.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/net/ipv6/ip6_input.c b/net/ipv6/ip6_input.c
index aacfb4bce153..c45b12b4431c 100644
--- a/net/ipv6/ip6_input.c
+++ b/net/ipv6/ip6_input.c
@@ -122,11 +122,14 @@ int ipv6_rcv(struct sk_buff *skb, struct net_device *dev, 
struct packet_type *pt
max_t(unsigned short, 1, skb_shinfo(skb)->gso_segs));
/*
 * RFC4291 2.5.3
+* The loopback address must not be used as the source address in IPv6
+* packets that are sent outside of a single node. [..]
 * A packet received on an interface with a destination address
 * of loopback must be dropped.
 */
-   if (!(dev->flags & IFF_LOOPBACK) &&
-   ipv6_addr_loopback(&hdr->daddr))
+   if ((ipv6_addr_loopback(&hdr->saddr) ||
+ipv6_addr_loopback(&hdr->daddr)) &&
+!(dev->flags & IFF_LOOPBACK))
goto err;
 
/* RFC4291 Errata ID: 3480
-- 
2.10.2

pull request: bluetooth-next 2017-04-14

2017-04-14 Thread Johan Hedberg

Hi Dave,

Here's the main batch of Bluetooth & 802.15.4 patches for the 4.12
kernel.

 - Many fixes to 6LoWPAN, in particular for BLE
 - New CA8210 IEEE 802.15.4 device driver (accounting for most of the
   lines of code added in this pull request)
 - Added Nokia Bluetooth (UART) HCI driver
 - Some serdev & TTY changes that are dependencies for the Nokia
   driver (with acks from relevant maintainers and an agreement that
   these come through the bluetooth tree)
 - Support for new Intel Bluetooth device
 - Various other minor cleanups/fixes here and there

Please let me know if there are any issues pulling. Thanks.

Johan

---
The following changes since commit d92be7a41ef15463eb816a4a2d42bf094b56dfce:

  net: make struct net_device::min_header_len 8-bit (2017-04-12 13:59:21 -0400)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next.git 
for-upstream

for you to fetch changes up to 019aa56b7df8a796b2c01a56269a370ad3442ec7:

  arm64: dts: hikey: add WL1835 Bluetooth device node (2017-04-13 19:22:53 
+0200)


Alexander Aring (2):
  6lowpan: iphc: override l2 packet information
  ipv6: addrconf: fix 48 bit 6lowpan autoconfiguration

Andy Shevchenko (1):
  Bluetooth: hci_bcm: Support platform enumeration

Colin Ian King (4):
  Bluetooth: btmrvl: fix spelling mistake: "unregester" -> "unregister"
  Bluetooth: fix assignments on error variable err
  6lowpan: fix assignment of peer_addr
  ieee802154: ca8210: Add checks for kmalloc allocation failures

Dean Jenkins (2):
  Bluetooth: Handle bt_accept_enqueue() socket atomically
  Bluetooth: Avoid bt_accept_unlink() double unlinking

Elena Reshetova (1):
  Bluetooth: convert rfcomm_dlc.refcnt from atomic_t to refcount_t

Gabriel (1):
  Bluetooth: Added support for Rivet Networks Killer 1535

Geliang Tang (1):
  Bluetooth: bluecard: use setup_timer

Harry Morris (3):
  ieee802154: Add CA8210 IEEE 802.15.4 device driver
  ieee802154: Add device tree documentation for CA8210
  ieee802154: Add entry in MAINTAINTERS for CA8210 driver

Jeffy Chen (2):
  Bluetooth: btusb: wake system up when receives a wake irq
  Bluetooth: btmrvl: wake system up when receives a wake irq

Johan Hovold (2):
  Bluetooth: hci_bcm: add missing tty-device sanity check
  Bluetooth: hci_intel: add missing tty-device sanity check

John Keeping (1):
  Bluetooth: hci_bcm: Fix clock (un)prepare

Jonas Holmberg (1):
  Bluetooth: Change initial min and max interval

Larry Finger (1):
  Bluetooth: btrtl: Change message for missing config file

Luiz Augusto von Dentz (10):
  6lowpan: Use netdev addr_len to determine lladdr len
  6lowpan: Fix IID format for Bluetooth
  Bluetooth: 6lowpan: Remove unnecessary peer lookup
  Bluetooth: 6lowpan: Print errors during recv_pkt
  Bluetooth: L2CAP: Don't return -EAGAIN if out of credits
  6lowpan: Don't set IFF_NO_QUEUE
  Bluetooth: 6lowpan: Don't drop packets when run out of credits
  Bluetooth: 6lowpan: Use netif APIs to flow control
  Bluetooth: L2CAP: Add l2cap_le_flowctl_send
  Bluetooth: 6lowpan: Set tx_queue_len to DEFAULT_TX_QUEUE_LEN

Marcel Holtmann (1):
  Bluetooth: btusb: Add support for Intel Bluetooth devices 9160/9260 
[8087:0025]

Marcin Kraglak (1):
  Bluetooth: L2CAP: Fix L2CAP_CR_SCID_IN_USE value

Michael Scott (2):
  Bluetooth: 6lowpan: fix delay work init in add_peer_chan()
  Bluetooth: 6lowpan: fix use after free in chan_suspend/resume

Patrik Flykt (3):
  bluetooth: Set 6 byte device addresses
  6lowpan: Set MAC address length according to LOWPAN_LLTYPE
  bluetooth: Do not set IFF_POINTOPOINT

Rob Herring (5):
  Bluetooth: hci_uart: add serdev driver support library
  dt-bindings: net: Add TI WiLink shared transport binding
  bluetooth: hci_uart: remove unused hci_uart_init_tty
  bluetooth: hci_uart: add LL protocol serdev driver support
  arm64: dts: hikey: add WL1835 Bluetooth device node

Sebastian Reichel (9):
  tty: serial: omap: add UPF_BOOT_AUTOCONF flag for DT init
  serdev: add serdev_device_wait_until_sent
  serdev: implement get/set tiocm
  serdev: add helpers for cts and rts handling
  Bluetooth: hci_uart: add support for word alignment
  Bluetooth: hci_serdev: do not open device in hci open
  Bluetooth: hci_serdev: allow modular drivers
  dt-bindings: net: bluetooth: Add nokia-bluetooth
  Bluetooth: add nokia driver

Tedd Ho-Jeong An (3):
  Bluetooth: Use switch statement for Intel hardware variants
  Bluetooth: hci_intel: Fix firmware file name to use hw_variant
  Bluetooth: hci_intel: Add support Intel Bluetooth device 9160/9260 for 
UART

Xinming Hu (2):
  Bluetooth: btmrvl: disable platform wakeup interrupt in suspend failure 
path
  Bluetooth: btmrvl: remove unnecessary

[PATCH net-next] bonding: do not pass link-local packets to master-interface

2017-04-14 Thread Chonggang Li

Previously link-local packets excluding LACP (which are handled by
the recv_probe) received on bond slave interfaces are delivered to
stack with bond-master device with RX_HANDLER_ANOTHER, however all
link-local packets are link specific and should be delivered with
exact same link/dev.

Signed-off-by: Chonggang Li 
Signed-off-by: Mahesh Bandewar 
Signed-off-by: Maciej Żenczykowski 
---
 drivers/net/bonding/bond_main.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 01e4a69af421..aeca3d8541b9 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1176,6 +1176,9 @@ static rx_handler_result_t bond_handle_frame(struct 
sk_buff **pskb)
}
}
 
+   /* link-local packets should not be passed to master interface */
+   if (is_link_local_ether_addr(eth_hdr(skb)->h_dest))
+   return RX_HANDLER_PASS;
if (bond_should_deliver_exact_match(skb, slave, bond))
return RX_HANDLER_EXACT;
 
-- 
2.12.2.762.g0e3151a226-goog

Re: [PATCH net-next v2 6/6] vxlan: allow multiple VXLANs with same VNI for IPv6 link-local addresses

2017-04-14 Thread Stephen Hemminger

On Fri, 14 Apr 2017 18:44:46 +0200
Matthias Schiffer  wrote:

> As link-local addresses are only valid for a single interface, we can allow
> to use the same VNI for multiple independent VXLANs, as long as the used
> interfaces are distinct. This way, VXLANs can always be used as a drop-in
> replacement for VLANs with greater ID space.
> 
> This also extends VNI lookup to respect the ifindex when link-local IPv6
> addresses are used, so using the same VNI on multiple interfaces can
> actually work.
> 
> Signed-off-by: Matthias Schiffer 

Why does this have to be IPv6 specific?

What about the case where VXLAN is not bound to an interface?
If that is used then that could be a problem.

Re: [PATCH net-next v2 4/6] vxlan: check valid combinations of address scopes

2017-04-14 Thread Stephen Hemminger

On Fri, 14 Apr 2017 18:44:44 +0200
Matthias Schiffer  wrote:

> diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
> index 07f89b037681..95a71546e8f2 100644
> --- a/drivers/net/vxlan.c
> +++ b/drivers/net/vxlan.c
> @@ -2881,11 +2881,39 @@ static int vxlan_config_validate(struct net *src_net, 
> struct vxlan_config *conf,
>   if (conf->saddr.sa.sa_family != conf->remote_ip.sa.sa_family)
>   return -EINVAL;
>  
> + if (vxlan_addr_multicast(&conf->saddr))
> + return -EINVAL;
> +
>   if (conf->saddr.sa.sa_family == AF_INET6) {
>   if (!IS_ENABLED(CONFIG_IPV6))
>   return -EPFNOSUPPORT;
>   use_ipv6 = true;
>   conf->flags |= VXLAN_F_IPV6;
> +
> + if (!(conf->flags & VXLAN_F_COLLECT_METADATA)) {
> + int local_type =
> + ipv6_addr_type(&conf->saddr.sin6.sin6_addr);
> + int remote_type =
> + ipv6_addr_type(&conf->remote_ip.sin6.sin6_addr);
> +
> + if (local_type & IPV6_ADDR_LINKLOCAL) {
> + if (!(remote_type & IPV6_ADDR_LINKLOCAL) &&
> + (remote_type != IPV6_ADDR_ANY)) {
> + pr_info("invalid combination of address 
> scopes\n");

It is always helpful to include device if possible in error message.
netdev_notice(old->dev, " invalid 
combination of address scopes\n");
Also vxlan is good candidate for extended netlink error reporting.

[PATCH net-next 1/6] bpf: lru: Add test_lru_sanity6 for BPF_F_NO_COMMON_LRU

2017-04-14 Thread Martin KaFai Lau

test_lru_sanity3 is not applicable to BPF_F_NO_COMMON_LRU.
It just happens to work when PERCPU_FREE_TARGET == 16.

This patch:
1) Disable test_lru_sanity3 for BPF_F_NO_COMMON_LRU
2) Add test_lru_sanity6 to test list rotation for
   the BPF_F_NO_COMMON_LRU map.

Signed-off-by: Martin KaFai Lau 
Acked-by: Alexei Starovoitov 
Acked-by: Daniel Borkmann 
---
 tools/testing/selftests/bpf/test_lru_map.c | 70 +++---
 1 file changed, 65 insertions(+), 5 deletions(-)

diff --git a/tools/testing/selftests/bpf/test_lru_map.c 
b/tools/testing/selftests/bpf/test_lru_map.c
index 00b0aff56e2e..04b2242937cc 100644
--- a/tools/testing/selftests/bpf/test_lru_map.c
+++ b/tools/testing/selftests/bpf/test_lru_map.c
@@ -387,6 +387,10 @@ static void test_lru_sanity3(int map_type, int map_flags, 
unsigned int tgt_free)
unsigned int map_size;
int next_cpu = 0;
 
+   if (map_flags & BPF_F_NO_COMMON_LRU)
+   /* This test is only applicable to common LRU list */
+   return;
+
printf("%s (map_type:%d map_flags:0x%X): ", __func__, map_type,
   map_flags);
 
@@ -396,11 +400,7 @@ static void test_lru_sanity3(int map_type, int map_flags, 
unsigned int tgt_free)
assert(batch_size * 2 == tgt_free);
 
map_size = tgt_free * 2;
-   if (map_flags & BPF_F_NO_COMMON_LRU)
-   lru_map_fd = create_map(map_type, map_flags,
-   map_size * nr_cpus);
-   else
-   lru_map_fd = create_map(map_type, map_flags, map_size);
+   lru_map_fd = create_map(map_type, map_flags, map_size);
assert(lru_map_fd != -1);
 
expected_map_fd = create_map(BPF_MAP_TYPE_HASH, 0, map_size);
@@ -566,6 +566,65 @@ static void test_lru_sanity5(int map_type, int map_flags)
printf("Pass\n");
 }
 
+/* Test list rotation for BPF_F_NO_COMMON_LRU map */
+static void test_lru_sanity6(int map_type, int map_flags, int tgt_free)
+{
+   int lru_map_fd, expected_map_fd;
+   unsigned long long key, value[nr_cpus];
+   unsigned int map_size = tgt_free * 2;
+   int next_cpu = 0;
+
+   if (!(map_flags & BPF_F_NO_COMMON_LRU))
+   return;
+
+   printf("%s (map_type:%d map_flags:0x%X): ", __func__, map_type,
+  map_flags);
+
+   assert(sched_next_online(0, &next_cpu) != -1);
+
+   expected_map_fd = create_map(BPF_MAP_TYPE_HASH, 0, map_size);
+   assert(expected_map_fd != -1);
+
+   lru_map_fd = create_map(map_type, map_flags, map_size * nr_cpus);
+   assert(lru_map_fd != -1);
+
+   value[0] = 1234;
+
+   for (key = 1; key <= tgt_free; key++) {
+   assert(!bpf_map_update_elem(lru_map_fd, &key, value,
+   BPF_NOEXIST));
+   assert(!bpf_map_update_elem(expected_map_fd, &key, value,
+   BPF_NOEXIST));
+   }
+
+   for (; key <= tgt_free * 2; key++) {
+   unsigned long long stable_key;
+
+   /* Make ref bit sticky for key: [1, tgt_free] */
+   for (stable_key = 1; stable_key <= tgt_free; stable_key++) {
+   /* Mark the ref bit */
+   assert(!bpf_map_lookup_elem(lru_map_fd, &stable_key,
+   value));
+   }
+   assert(!bpf_map_update_elem(lru_map_fd, &key, value,
+   BPF_NOEXIST));
+   }
+
+   for (; key <= tgt_free * 3; key++) {
+   assert(!bpf_map_update_elem(lru_map_fd, &key, value,
+   BPF_NOEXIST));
+   assert(!bpf_map_update_elem(expected_map_fd, &key, value,
+   BPF_NOEXIST));
+   }
+
+   assert(map_equal(lru_map_fd, expected_map_fd));
+
+   close(expected_map_fd);
+   close(lru_map_fd);
+
+   printf("Pass\n");
+}
+
 int main(int argc, char **argv)
 {
struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
@@ -593,6 +652,7 @@ int main(int argc, char **argv)
test_lru_sanity3(map_types[t], map_flags[f], tgt_free);
test_lru_sanity4(map_types[t], map_flags[f], tgt_free);
test_lru_sanity5(map_types[t], map_flags[f]);
+   test_lru_sanity6(map_types[t], map_flags[f], tgt_free);
 
printf("\n");
}
-- 
2.9.3

[PATCH net-next 6/6] bpf: lru: Add map-in-map LRU example

2017-04-14 Thread Martin KaFai Lau

This patch adds a map-in-map LRU example.
If we know only a subset of cores will use the
LRU, we can allocate a common LRU list per targeting core
and store it into an array-of-hashs.

It allows using the common LRU map with map-update performance
comparable to the BPF_F_NO_COMMON_LRU map but without wasting memory
on the unused cores that we know they will never access the LRU map.

BPF_F_NO_COMMON_LRU:
> map_perf_test 32 8 1000 1000 | awk '{sum += $3}END{print sum}'
9234314 (9.23M/s)

map-in-map LRU:
> map_perf_test 512 8 126 8000 | awk '{sum += $3}END{print sum}'
9962743 (9.96M/s)

Notes that the max_entries for the map-in-map LRU test is 126 which
is the max_entries for each inner LRU map.  8 processes have been
started, so 8 * 126 = 1008 (~10M) which is close to what is
used in the BPF_F_NO_COMMON_LRU test.

Signed-off-by: Martin KaFai Lau 
---
 samples/bpf/map_perf_test_kern.c | 34 --
 samples/bpf/map_perf_test_user.c | 62 
 2 files changed, 93 insertions(+), 3 deletions(-)

diff --git a/samples/bpf/map_perf_test_kern.c b/samples/bpf/map_perf_test_kern.c
index 404ed53b8a53..245165817fbe 100644
--- a/samples/bpf/map_perf_test_kern.c
+++ b/samples/bpf/map_perf_test_kern.c
@@ -11,6 +11,7 @@
 #include "bpf_helpers.h"
 
 #define MAX_ENTRIES 1000
+#define MAX_NR_CPUS 1024
 
 struct bpf_map_def SEC("maps") hash_map = {
.type = BPF_MAP_TYPE_HASH,
@@ -34,6 +35,19 @@ struct bpf_map_def SEC("maps") nocommon_lru_hash_map = {
.map_flags = BPF_F_NO_COMMON_LRU,
 };
 
+struct bpf_map_def SEC("maps") inner_lru_hash_map = {
+   .type = BPF_MAP_TYPE_LRU_HASH,
+   .key_size = sizeof(u32),
+   .value_size = sizeof(long),
+   .max_entries = MAX_ENTRIES,
+};
+
+struct bpf_map_def SEC("maps") array_of_lru_hashs = {
+   .type = BPF_MAP_TYPE_ARRAY_OF_MAPS,
+   .key_size = sizeof(u32),
+   .max_entries = MAX_NR_CPUS,
+};
+
 struct bpf_map_def SEC("maps") percpu_hash_map = {
.type = BPF_MAP_TYPE_PERCPU_HASH,
.key_size = sizeof(u32),
@@ -154,13 +168,27 @@ int stress_lru_hmap_alloc(struct pt_regs *ctx)
 
test_case = dst6[7];
 
-   if (test_case == 0)
+   if (test_case == 0) {
ret = bpf_map_update_elem(&lru_hash_map, &key, &val, BPF_ANY);
-   else if (test_case == 1)
+   } else if (test_case == 1) {
ret = bpf_map_update_elem(&nocommon_lru_hash_map, &key, &val,
  BPF_ANY);
-   else
+   } else if (test_case == 2) {
+   void *nolocal_lru_map;
+   int cpu = bpf_get_smp_processor_id();
+
+   nolocal_lru_map = bpf_map_lookup_elem(&array_of_lru_hashs,
+ &cpu);
+   if (!nolocal_lru_map) {
+   ret = -ENOENT;
+   goto done;
+   }
+
+   ret = bpf_map_update_elem(nolocal_lru_map, &key, &val,
+ BPF_ANY);
+   } else {
ret = -EINVAL;
+   }
 
 done:
if (ret)
diff --git a/samples/bpf/map_perf_test_user.c b/samples/bpf/map_perf_test_user.c
index 2a12f48b5c6d..6ac778153315 100644
--- a/samples/bpf/map_perf_test_user.c
+++ b/samples/bpf/map_perf_test_user.c
@@ -25,6 +25,7 @@
 #include "bpf_load.h"
 
 #define TEST_BIT(t) (1U << (t))
+#define MAX_NR_CPUS 1024
 
 static __u64 time_get_ns(void)
 {
@@ -44,6 +45,7 @@ enum test_type {
LPM_KMALLOC,
HASH_LOOKUP,
ARRAY_LOOKUP,
+   INNER_LRU_HASH_PREALLOC,
NR_TESTS,
 };
 
@@ -57,10 +59,14 @@ const char *test_map_names[NR_TESTS] = {
[LPM_KMALLOC] = "lpm_trie_map_alloc",
[HASH_LOOKUP] = "hash_map",
[ARRAY_LOOKUP] = "array_map",
+   [INNER_LRU_HASH_PREALLOC] = "inner_lru_hash_map",
 };
 
 static int test_flags = ~0;
 static uint32_t num_map_entries;
+static uint32_t inner_lru_hash_size;
+static int inner_lru_hash_idx = -1;
+static int array_of_lru_hashs_idx = -1;
 static uint32_t max_cnt = 100;
 
 static int check_test_flags(enum test_type t)
@@ -82,11 +88,42 @@ static void test_hash_prealloc(int cpu)
 
 static void do_test_lru(enum test_type test, int cpu)
 {
+   static int inner_lru_map_fds[MAX_NR_CPUS];
+
struct sockaddr_in6 in6 = { .sin6_family = AF_INET6 };
const char *test_name;
__u64 start_time;
int i, ret;
 
+   if (test == INNER_LRU_HASH_PREALLOC) {
+   int outer_fd = map_fd[array_of_lru_hashs_idx];
+
+   assert(cpu < MAX_NR_CPUS);
+
+   if (cpu) {
+   inner_lru_map_fds[cpu] =
+   bpf_create_map(BPF_MAP_TYPE_LRU_HASH,
+  sizeof(uint32_t), sizeof(long),
+  inner_lru_hash_size, 0);
+   if (inner_lru_map_fds[cpu] == -1) {
+

[PATCH net-next 0/6] bpf: LRU performance and test-program improvements

2017-04-14 Thread Martin KaFai Lau

The first 4 patches make a few improvements to the LRU tests.

Patch 5/6 is to improve the performance of BPF_F_NO_COMMON_LRU map.

Patch 6/6 adds an example in using LRU map with map-in-map.

Martin KaFai Lau (6):
  bpf: lru: Add test_lru_sanity6 for BPF_F_NO_COMMON_LRU
  bpf: lru: Cleanup test_lru_map.c
  bpf: lru: Refactor LRU map tests in map_perf_test
  bpf: Allow bpf sample programs (*_user.c) to change bpf_map_def
  bpf: lru: Lower the PERCPU_NR_SCANS from 16 to 4
  bpf: lru: Add map-in-map LRU example

 kernel/bpf/bpf_lru_list.c  |   2 +-
 samples/bpf/bpf_load.c | 114 ++---
 samples/bpf/bpf_load.h |  13 ++
 samples/bpf/map_perf_test_kern.c   |  75 +++--
 samples/bpf/map_perf_test_user.c   | 247 +
 tools/testing/selftests/bpf/test_lru_map.c | 104 
 6 files changed, 428 insertions(+), 127 deletions(-)

-- 
2.9.3

[PATCH net-next 3/6] bpf: lru: Refactor LRU map tests in map_perf_test

2017-04-14 Thread Martin KaFai Lau

One more LRU test will be added later in this patch series.
In this patch, we first move all existing LRU map tests into
a single syscall (connect) first so that the future new
LRU test can be added without hunting another syscall.

One of the map name is also changed from percpu_lru_hash_map
to nocommon_lru_hash_map to avoid the confusion with percpu_hash_map.

Signed-off-by: Martin KaFai Lau 
Acked-by: Alexei Starovoitov 
Acked-by: Daniel Borkmann 
---
 samples/bpf/map_perf_test_kern.c | 43 +++-
 samples/bpf/map_perf_test_user.c | 53 +++-
 2 files changed, 67 insertions(+), 29 deletions(-)

diff --git a/samples/bpf/map_perf_test_kern.c b/samples/bpf/map_perf_test_kern.c
index 9da2a3441b0a..404ed53b8a53 100644
--- a/samples/bpf/map_perf_test_kern.c
+++ b/samples/bpf/map_perf_test_kern.c
@@ -26,7 +26,7 @@ struct bpf_map_def SEC("maps") lru_hash_map = {
.max_entries = 1,
 };
 
-struct bpf_map_def SEC("maps") percpu_lru_hash_map = {
+struct bpf_map_def SEC("maps") nocommon_lru_hash_map = {
.type = BPF_MAP_TYPE_LRU_HASH,
.key_size = sizeof(u32),
.value_size = sizeof(long),
@@ -100,6 +100,7 @@ int stress_percpu_hmap(struct pt_regs *ctx)
bpf_map_delete_elem(&percpu_hash_map, &key);
return 0;
 }
+
 SEC("kprobe/sys_getgid")
 int stress_hmap_alloc(struct pt_regs *ctx)
 {
@@ -128,24 +129,42 @@ int stress_percpu_hmap_alloc(struct pt_regs *ctx)
return 0;
 }
 
-SEC("kprobe/sys_getpid")
+SEC("kprobe/sys_connect")
 int stress_lru_hmap_alloc(struct pt_regs *ctx)
 {
-   u32 key = bpf_get_prandom_u32();
+   struct sockaddr_in6 *in6;
+   u16 test_case, dst6[8];
+   int addrlen, ret;
+   char fmt[] = "Failed at stress_lru_hmap_alloc. ret:%d\n";
long val = 1;
+   u32 key = bpf_get_prandom_u32();
 
-   bpf_map_update_elem(&lru_hash_map, &key, &val, BPF_ANY);
+   in6 = (struct sockaddr_in6 *)PT_REGS_PARM2(ctx);
+   addrlen = (int)PT_REGS_PARM3(ctx);
 
-   return 0;
-}
+   if (addrlen != sizeof(*in6))
+   return 0;
 
-SEC("kprobe/sys_getppid")
-int stress_percpu_lru_hmap_alloc(struct pt_regs *ctx)
-{
-   u32 key = bpf_get_prandom_u32();
-   long val = 1;
+   ret = bpf_probe_read(dst6, sizeof(dst6), &in6->sin6_addr);
+   if (ret)
+   goto done;
+
+   if (dst6[0] != 0xdead || dst6[1] != 0xbeef)
+   return 0;
+
+   test_case = dst6[7];
+
+   if (test_case == 0)
+   ret = bpf_map_update_elem(&lru_hash_map, &key, &val, BPF_ANY);
+   else if (test_case == 1)
+   ret = bpf_map_update_elem(&nocommon_lru_hash_map, &key, &val,
+ BPF_ANY);
+   else
+   ret = -EINVAL;
 
-   bpf_map_update_elem(&percpu_lru_hash_map, &key, &val, BPF_ANY);
+done:
+   if (ret)
+   bpf_trace_printk(fmt, sizeof(fmt), ret);
 
return 0;
 }
diff --git a/samples/bpf/map_perf_test_user.c b/samples/bpf/map_perf_test_user.c
index e29ff318a793..51cb8f238aa2 100644
--- a/samples/bpf/map_perf_test_user.c
+++ b/samples/bpf/map_perf_test_user.c
@@ -18,6 +18,9 @@
 #include 
 #include 
 #include 
+#include 
+#include 
+
 #include "libbpf.h"
 #include "bpf_load.h"
 
@@ -36,7 +39,7 @@ static __u64 time_get_ns(void)
 #define HASH_KMALLOC   (1 << 2)
 #define PERCPU_HASH_KMALLOC(1 << 3)
 #define LRU_HASH_PREALLOC  (1 << 4)
-#define PERCPU_LRU_HASH_PREALLOC   (1 << 5)
+#define NOCOMMON_LRU_HASH_PREALLOC (1 << 5)
 #define LPM_KMALLOC(1 << 6)
 #define HASH_LOOKUP(1 << 7)
 #define ARRAY_LOOKUP   (1 << 8)
@@ -55,28 +58,44 @@ static void test_hash_prealloc(int cpu)
   cpu, MAX_CNT * 10ll / (time_get_ns() - start_time));
 }
 
-static void test_lru_hash_prealloc(int cpu)
+static void do_test_lru(int lru_test_flag, int cpu)
 {
+   struct sockaddr_in6 in6 = { .sin6_family = AF_INET6 };
+   const char *test_name;
__u64 start_time;
-   int i;
+   int i, ret;
+
+   in6.sin6_addr.s6_addr16[0] = 0xdead;
+   in6.sin6_addr.s6_addr16[1] = 0xbeef;
+
+   if (lru_test_flag & LRU_HASH_PREALLOC) {
+   test_name = "lru_hash_map_perf";
+   in6.sin6_addr.s6_addr16[7] = 0;
+   } else if (lru_test_flag & NOCOMMON_LRU_HASH_PREALLOC) {
+   test_name = "nocommon_lru_hash_map_perf";
+   in6.sin6_addr.s6_addr16[7] = 1;
+   } else {
+   assert(0);
+   }
 
start_time = time_get_ns();
-   for (i = 0; i < MAX_CNT; i++)
-   syscall(__NR_getpid);
-   printf("%d:lru_hash_map_perf pre-alloc %lld events per sec\n",
-  cpu, MAX_CNT * 10ll / (time_get_ns() - start_time));
+   for (i = 0; i < MAX_CNT; i++) {
+   ret = connect(-1, (const struct sockaddr *)&in6, sizeof(in6));
+   assert(ret == -1 && errno == EBADF);
+

[PATCH net-next 5/6] bpf: lru: Lower the PERCPU_NR_SCANS from 16 to 4

2017-04-14 Thread Martin KaFai Lau

After doing map_perf_test with a much bigger
BPF_F_NO_COMMON_LRU map, the perf report shows a
lot of time spent in rotating the inactive list (i.e.
__bpf_lru_list_rotate_inactive):
> map_perf_test 32 8 1 100 | awk '{sum += $3}END{print sum}'
19644783 (19M/s)
> map_perf_test 32 8 1000 1000 |  awk '{sum += $3}END{print sum}'
6283930 (6.28M/s)

By inactive, it usually means the element is not in cache.  Hence,
there is a need to tune the PERCPU_NR_SCANS value.

This patch finds a better number of elements to
scan during each list rotation.  The PERCPU_NR_SCANS (which
is defined the same as PERCPU_FREE_TARGET) decreases
from 16 elements to 4 elements.  This change only
affects the BPF_F_NO_COMMON_LRU map.

The test_lru_dist does not show meaningful difference
between 16 and 4.  Our production L4 load balancer which uses
the LRU map for conntrack-ing also shows little change in cache
hit rate.  Since both benchmark and production data show no
cache-hit difference, PERCPU_NR_SCANS is lowered from 16 to 4.
We can consider making it configurable if we find a usecase
later that shows another value works better and/or use
a different rotation strategy.

After this change:
> map_perf_test 32 8 1000 1000 |  awk '{sum += $3}END{print sum}'
9240324 (9.2M/s)

i.e. 6.28M/s -> 9.2M/s

The test_lru_dist has not shown meaningful difference:
> test_lru_dist zipf.100k.a1_01.out 4000 1:
nr_misses: 31575 (Before) vs 31566 (After)

> test_lru_dist zipf.100k.a0_01.out 4 1
nr_misses: 67036 (Before) vs 67031 (After)

Signed-off-by: Martin KaFai Lau 
Acked-by: Alexei Starovoitov 
Acked-by: Daniel Borkmann 
---
 kernel/bpf/bpf_lru_list.c  | 2 +-
 tools/testing/selftests/bpf/test_lru_map.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/bpf/bpf_lru_list.c b/kernel/bpf/bpf_lru_list.c
index f62d1d56f41d..e6ef4401a138 100644
--- a/kernel/bpf/bpf_lru_list.c
+++ b/kernel/bpf/bpf_lru_list.c
@@ -13,7 +13,7 @@
 #define LOCAL_FREE_TARGET  (128)
 #define LOCAL_NR_SCANS LOCAL_FREE_TARGET
 
-#define PERCPU_FREE_TARGET (16)
+#define PERCPU_FREE_TARGET (4)
 #define PERCPU_NR_SCANSPERCPU_FREE_TARGET
 
 /* Helpers to get the local list index */
diff --git a/tools/testing/selftests/bpf/test_lru_map.c 
b/tools/testing/selftests/bpf/test_lru_map.c
index 87c05e5bdfdd..8c10c9180c1a 100644
--- a/tools/testing/selftests/bpf/test_lru_map.c
+++ b/tools/testing/selftests/bpf/test_lru_map.c
@@ -22,7 +22,7 @@
 #include "bpf_util.h"
 
 #define LOCAL_FREE_TARGET  (128)
-#define PERCPU_FREE_TARGET (16)
+#define PERCPU_FREE_TARGET (4)
 
 static int nr_cpus;
 
-- 
2.9.3

[PATCH net-next 4/6] bpf: Allow bpf sample programs (*_user.c) to change bpf_map_def

2017-04-14 Thread Martin KaFai Lau

The current bpf_map_def is statically defined during compile
time.  This patch allows the *_user.c program to change it during
runtime.  It is done by adding load_bpf_file_fixup_map() which
takes a callback.  The callback will be called before creating
each map so that it has a chance to modify the bpf_map_def.

The current usecase is to change max_entries in map_perf_test.
It is interesting to test with a much bigger map size in
some cases (e.g. the following patch on bpf_lru_map.c).
However,  it is hard to find one size to fit all testing
environment.  Hence, it is handy to take the max_entries
as a cmdline arg and then configure the bpf_map_def during
runtime.

This patch adds two cmdline args.  One is to configure
the map's max_entries.  Another is to configure the max_cnt
which controls how many times a syscall is called.

Signed-off-by: Martin KaFai Lau 
Acked-by: Alexei Starovoitov 
Acked-by: Daniel Borkmann 
---
 samples/bpf/bpf_load.c   | 114 +-
 samples/bpf/bpf_load.h   |  13 
 samples/bpf/map_perf_test_user.c | 148 ---
 3 files changed, 201 insertions(+), 74 deletions(-)

diff --git a/samples/bpf/bpf_load.c b/samples/bpf/bpf_load.c
index dcdce1270d38..0d449d8032d1 100644
--- a/samples/bpf/bpf_load.c
+++ b/samples/bpf/bpf_load.c
@@ -21,6 +21,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "libbpf.h"
 #include "bpf_load.h"
 #include "perf-sys.h"
@@ -37,15 +38,6 @@ int event_fd[MAX_PROGS];
 int prog_cnt;
 int prog_array_fd = -1;
 
-struct bpf_map_def {
-   unsigned int type;
-   unsigned int key_size;
-   unsigned int value_size;
-   unsigned int max_entries;
-   unsigned int map_flags;
-   unsigned int inner_map_idx;
-};
-
 static int populate_prog_array(const char *event, int prog_fd)
 {
int ind = atoi(event), err;
@@ -193,11 +185,14 @@ static int load_and_attach(const char *event, struct 
bpf_insn *prog, int size)
return 0;
 }
 
-static int load_maps(struct bpf_map_def *maps, int len)
+static int load_maps(struct bpf_map_def *maps, int len,
+const char **map_names, fixup_map_cb fixup_map)
 {
int i;
 
for (i = 0; i < len / sizeof(struct bpf_map_def); i++) {
+   if (fixup_map)
+   fixup_map(&maps[i], map_names[i], i);
 
if (maps[i].type == BPF_MAP_TYPE_ARRAY_OF_MAPS ||
maps[i].type == BPF_MAP_TYPE_HASH_OF_MAPS) {
@@ -280,14 +275,64 @@ static int parse_relo_and_apply(Elf_Data *data, Elf_Data 
*symbols,
return 0;
 }
 
-int load_bpf_file(char *path)
+static int cmp_symbols(const void *l, const void *r)
+{
+   const GElf_Sym *lsym = (const GElf_Sym *)l;
+   const GElf_Sym *rsym = (const GElf_Sym *)r;
+
+   if (lsym->st_value < rsym->st_value)
+   return -1;
+   else if (lsym->st_value > rsym->st_value)
+   return 1;
+   else
+   return 0;
+}
+
+static int get_sorted_map_names(Elf *elf, Elf_Data *symbols, int maps_shndx,
+   int strtabidx, char **map_names)
+{
+   GElf_Sym map_symbols[MAX_MAPS];
+   int i, nr_maps = 0;
+
+   for (i = 0; i < symbols->d_size / sizeof(GElf_Sym); i++) {
+   assert(nr_maps < MAX_MAPS);
+   if (!gelf_getsym(symbols, i, &map_symbols[nr_maps]))
+   continue;
+   if (map_symbols[nr_maps].st_shndx != maps_shndx)
+   continue;
+   nr_maps++;
+   }
+
+   qsort(map_symbols, nr_maps, sizeof(GElf_Sym), cmp_symbols);
+
+   for (i = 0; i < nr_maps; i++) {
+   char *map_name;
+
+   map_name = elf_strptr(elf, strtabidx, map_symbols[i].st_name);
+   if (!map_name) {
+   printf("cannot get map symbol\n");
+   return 1;
+   }
+
+   map_names[i] = strdup(map_name);
+   if (!map_names[i]) {
+   printf("strdup(%s): %s(%d)\n", map_name,
+  strerror(errno), errno);
+   return 1;
+   }
+   }
+
+   return 0;
+}
+
+static int do_load_bpf_file(const char *path, fixup_map_cb fixup_map)
 {
-   int fd, i;
+   int fd, i, ret, maps_shndx = -1, strtabidx = -1;
Elf *elf;
GElf_Ehdr ehdr;
GElf_Shdr shdr, shdr_prog;
-   Elf_Data *data, *data_prog, *symbols = NULL;
-   char *shname, *shname_prog;
+   Elf_Data *data, *data_prog, *data_maps = NULL, *symbols = NULL;
+   char *shname, *shname_prog, *map_names[MAX_MAPS] = { NULL };
 
/* reset global variables */
kern_version = 0;
@@ -335,14 +380,33 @@ int load_bpf_file(char *path)
}
memcpy(&kern_version, data->d_buf, sizeof(int));
} else if (strcmp(shname, "maps") == 0) {
-   processed_sec

[PATCH net-next 2/6] bpf: lru: Cleanup test_lru_map.c

2017-04-14 Thread Martin KaFai Lau

This patch does the following cleanup on test_lru_map.c
1) Fix indentation (Replace spaces by tabs)
2) Remove redundant BPF_F_NO_COMMON_LRU test
3) Simplify some comments

Signed-off-by: Martin KaFai Lau 
Acked-by: Alexei Starovoitov 
Acked-by: Daniel Borkmann 
---
 tools/testing/selftests/bpf/test_lru_map.c | 32 +-
 1 file changed, 9 insertions(+), 23 deletions(-)

diff --git a/tools/testing/selftests/bpf/test_lru_map.c 
b/tools/testing/selftests/bpf/test_lru_map.c
index 04b2242937cc..87c05e5bdfdd 100644
--- a/tools/testing/selftests/bpf/test_lru_map.c
+++ b/tools/testing/selftests/bpf/test_lru_map.c
@@ -191,12 +191,7 @@ static void test_lru_sanity1(int map_type, int map_flags, 
unsigned int tgt_free)
int next_cpu = 0;
 
if (map_flags & BPF_F_NO_COMMON_LRU)
-   /* Ther percpu lru list (i.e each cpu has its own LRU
-* list) does not have a local free list.  Hence,
-* it will only free old nodes till there is no free
-* from the LRU list.  Hence, this test does not apply
-* to BPF_F_NO_COMMON_LRU
-*/
+   /* This test is only applicable to common LRU list */
return;
 
printf("%s (map_type:%d map_flags:0x%X): ", __func__, map_type,
@@ -227,7 +222,7 @@ static void test_lru_sanity1(int map_type, int map_flags, 
unsigned int tgt_free)
for (key = 1; key < end_key; key++) {
assert(!bpf_map_lookup_elem(lru_map_fd, &key, value));
assert(!bpf_map_update_elem(expected_map_fd, &key, value,
-   BPF_NOEXIST));
+   BPF_NOEXIST));
}
 
/* Insert 1+tgt_free to 2*tgt_free
@@ -273,12 +268,7 @@ static void test_lru_sanity2(int map_type, int map_flags, 
unsigned int tgt_free)
int next_cpu = 0;
 
if (map_flags & BPF_F_NO_COMMON_LRU)
-   /* Ther percpu lru list (i.e each cpu has its own LRU
-* list) does not have a local free list.  Hence,
-* it will only free old nodes till there is no free
-* from the LRU list.  Hence, this test does not apply
-* to BPF_F_NO_COMMON_LRU
-*/
+   /* This test is only applicable to common LRU list */
return;
 
printf("%s (map_type:%d map_flags:0x%X): ", __func__, map_type,
@@ -290,11 +280,7 @@ static void test_lru_sanity2(int map_type, int map_flags, 
unsigned int tgt_free)
assert(batch_size * 2 == tgt_free);
 
map_size = tgt_free + batch_size;
-   if (map_flags & BPF_F_NO_COMMON_LRU)
-   lru_map_fd = create_map(map_type, map_flags,
-   map_size * nr_cpus);
-   else
-   lru_map_fd = create_map(map_type, map_flags, map_size);
+   lru_map_fd = create_map(map_type, map_flags, map_size);
assert(lru_map_fd != -1);
 
expected_map_fd = create_map(BPF_MAP_TYPE_HASH, 0, map_size);
@@ -341,7 +327,7 @@ static void test_lru_sanity2(int map_type, int map_flags, 
unsigned int tgt_free)
assert(!bpf_map_lookup_elem(lru_map_fd, &key, value));
assert(value[0] == 4321);
assert(!bpf_map_update_elem(expected_map_fd, &key, value,
-   BPF_NOEXIST));
+   BPF_NOEXIST));
}
 
value[0] = 1234;
@@ -361,7 +347,7 @@ static void test_lru_sanity2(int map_type, int map_flags, 
unsigned int tgt_free)
assert(!bpf_map_update_elem(lru_map_fd, &key, value,
BPF_NOEXIST));
assert(!bpf_map_update_elem(expected_map_fd, &key, value,
-   BPF_NOEXIST));
+   BPF_NOEXIST));
}
 
assert(map_equal(lru_map_fd, expected_map_fd));
@@ -419,7 +405,7 @@ static void test_lru_sanity3(int map_type, int map_flags, 
unsigned int tgt_free)
for (key = 1; key < end_key; key++) {
assert(!bpf_map_lookup_elem(lru_map_fd, &key, value));
assert(!bpf_map_update_elem(expected_map_fd, &key, value,
-   BPF_NOEXIST));
+   BPF_NOEXIST));
}
 
/* Add 1+2*tgt_free to tgt_free*5/2
@@ -431,7 +417,7 @@ static void test_lru_sanity3(int map_type, int map_flags, 
unsigned int tgt_free)
assert(!bpf_map_update_elem(lru_map_fd, &key, value,
BPF_NOEXIST));
assert(!bpf_map_update_elem(expected_map_fd, &key, value,
-   BPF_NOEXIST));
+   BPF_NOEXIST));
}
 
assert(map_equal(lru_map_fd, expected_map_fd));
@@ -491,7 +477,7 @@ static v

Re: [PATCH net-next v2 4/6] vxlan: check valid combinations of address scopes

2017-04-14 Thread Sergei Shtylyov


On 04/14/2017 07:44 PM, Matthias Schiffer wrote:


* Multicast addresses are never valid as local address
* Link-local IPv6 unicast addresses may only be used as remote when the
  local address is link-local as well
* Don't allow link-local IPv6 local/remote addresses without interface

We also store in the flags field if link-local addresses are used for the
follow-up patches that actually make VXLAN over link-local IPv6 work.

Signed-off-by: Matthias Schiffer 
---

v2: was "vxlan: don't allow link-local IPv6 local/remote addresses without
interface" before. v2 does a lot more checks and adds the
VXLAN_F_IPV6_LINKLOCAL flag.

 drivers/net/vxlan.c | 35 +++
 include/net/vxlan.h |  2 ++
 2 files changed, 37 insertions(+)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 07f89b037681..95a71546e8f2 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -2881,11 +2881,39 @@ static int vxlan_config_validate(struct net *src_net, 
struct vxlan_config *conf,
if (conf->saddr.sa.sa_family != conf->remote_ip.sa.sa_family)
return -EINVAL;

+   if (vxlan_addr_multicast(&conf->saddr))
+   return -EINVAL;
+
if (conf->saddr.sa.sa_family == AF_INET6) {
if (!IS_ENABLED(CONFIG_IPV6))
return -EPFNOSUPPORT;
use_ipv6 = true;
conf->flags |= VXLAN_F_IPV6;
+
+   if (!(conf->flags & VXLAN_F_COLLECT_METADATA)) {
+   int local_type =
+   ipv6_addr_type(&conf->saddr.sin6.sin6_addr);
+   int remote_type =
+   ipv6_addr_type(&conf->remote_ip.sin6.sin6_addr);
+
+   if (local_type & IPV6_ADDR_LINKLOCAL) {
+   if (!(remote_type & IPV6_ADDR_LINKLOCAL) &&
+   (remote_type != IPV6_ADDR_ANY)) {
+   pr_info("invalid combination of address 
scopes\n");


   Maybe pr_err()?


+   return -EINVAL;
+   }
+
+   conf->flags |= VXLAN_F_IPV6_LINKLOCAL;
+   } else {
+   if (remote_type ==
+   (IPV6_ADDR_UNICAST | IPV6_ADDR_LINKLOCAL)) {
+   pr_info("invalid combination of address 
scopes\n");


   Here as well...


+   return -EINVAL;
+   }
+
+   conf->flags &= ~VXLAN_F_IPV6_LINKLOCAL;
+   }
+   }
}

if (conf->label && !use_ipv6) {

[...]

MBR, Sergei

Re: [Intel-wired-lan] [next-queue v6 PATCH 5/7] i40e: Add TX and RX support over port netdev's in switchdev mode

2017-04-14 Thread Alexander Duyck

On Wed, Mar 29, 2017 at 5:22 PM, Sridhar Samudrala
 wrote:
> In switchdev mode, broadcasts from VFs are received by the PF and passed
> to corresponding port representor netdev.
> Any frames sent via port netdevs are sent as directed transmits to the
> corresponding VFs. To enable directed transmit, skb metadata dst is used
> to pass the port id and the frame is requeued to call the PFs transmit
> routine. VF id is used as port id for VFs and PF port id is defined as
> I40_MAIN_VSI_PORT_ID.
>
> Small script to demonstrate inter VF and PF to VF pings in switchdev mode.
> PF: p4p1, VFs: p4p1_0,p4p1_1 VF Port Reps:p4p1-vf0, p4p1-vf1
> PF Port rep: p4p1-pf
>
> # rmmod i40e; modprobe i40e
> # devlink dev eswitch set pci/:05:00.0 mode switchdev
> # echo 2 > /sys/class/net/p4p1/device/sriov_numvfs
> # ip link set p4p1 vf 0 mac 00:11:22:33:44:55
> # ip link set p4p1 vf 1 mac 00:11:22:33:44:56
> # rmmod i40evf; modprobe i40evf
>
> /* Create 2 namespaces and move the VFs to the corresponding ns */
> # ip netns add ns0
> # ip link set p4p1_0 netns ns0
> # ip netns exec ns0 ip addr add 192.168.1.10/24 dev p4p1_0
> # ip netns exec ns0 ip link set p4p1_0 up
> # ip netns add ns1
> # ip link set p4p1_1 netns ns1
> # ip netns exec ns1 ip addr add 192.168.1.11/24 dev p4p1_1
> # ip netns exec ns1 ip link set p4p1_1 up
>
> /* bring up pf and port netdevs */
> # ip addr add 192.168.1.1/24 dev p4p1
> # ip link set p4p1 up
> # ip link set p4p1-vf0 up
> # ip link set p4p1-vf1 up
> # ip link set p4p1-pf up
>
> # ip netns exec ns0 ping -c3 192.168.1.11  /* VF0 -> VF1 */
> # ip netns exec ns1 ping -c3 192.168.1.10  /* VF1 -> VF0 */
> # ping -c3 192.168.1.10   /* PF -> VF0 */
> # ping -c3 192.168.1.11   /* PF -> VF1 */
>
> /* VF0 -> IP in same subnet - broadcasts will be seen on p4p1-vf0 & p4p1 */
> # ip netns exec ns0 ping -c1 -W1 192.168.1.200
> /* VF1 -> IP in same subnet -  broadcasts will be seen on p4p1-vf1 & p4p1*/
> # ip netns exec ns0 ping -c1 -W1 192.168.1.200
> /* port rep VF0 -> IP in same subnet - broadcasts will be seen on p4p1_0 */
> # ping -I p4p1-vf0 -c1 -W1 192.168.1.200
> /* port rep VF1 -> IP in same subnet  - broadcasts will be seen on p4p1_1 */
> # ping -I p4p1-vf1 -c1 -W1 192.168.1.200
>
> Signed-off-by: Sridhar Samudrala 
> ---
>  drivers/net/ethernet/intel/i40e/i40e.h |   4 +
>  drivers/net/ethernet/intel/i40e/i40e_main.c|  27 +++-
>  drivers/net/ethernet/intel/i40e/i40e_txrx.c| 148 
> -
>  drivers/net/ethernet/intel/i40e/i40e_txrx.h|   2 +
>  drivers/net/ethernet/intel/i40e/i40e_type.h|   3 +
>  drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c |   8 +-
>  6 files changed, 184 insertions(+), 8 deletions(-)
>



> diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c 
> b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> index ebffca0..86d2510 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> @@ -1302,20 +1302,64 @@ static bool i40e_alloc_mapped_page(struct i40e_ring 
> *rx_ring,
>  }
>
>  /**
> + * i40e_handle_lpbk_skb - Update skb->dev of a loopback frame
> + * @rx_ring: rx ring in play
> + * @skb: packet to send up
> + **/
> +static void i40e_handle_lpbk_skb(struct i40e_ring *rx_ring, struct sk_buff 
> *skb)
> +{
> +   struct i40e_q_vector *q_vector = rx_ring->q_vector;
> +   struct i40e_pf *pf = rx_ring->vsi->back;
> +   struct sk_buff *nskb;
> +   struct i40e_vf *vf;
> +   struct ethhdr *eth;
> +   int vf_id;
> +
> +   if ((skb->pkt_type != PACKET_BROADCAST) &&
> +   (skb->pkt_type != PACKET_MULTICAST) &&
> +   (skb->pkt_type != PACKET_OTHERHOST))
> +   return;
> +
> +   eth = (struct ethhdr *)skb_mac_header(skb);
> +
> +   /* If a loopback packet is received in switchdev mode, clone the skb
> +* and pass it to the corresponding port netdev based on the source 
> MAC.
> +*/
> +   for (vf_id = 0; vf_id < pf->num_alloc_vfs; vf_id++) {
> +   vf = &pf->vf[vf_id];
> +   if (ether_addr_equal(eth->h_source,
> +vf->default_lan_addr.addr)) {
> +   nskb = skb_clone(skb, GFP_ATOMIC);
> +   if (!nskb)
> +   break;
> +   nskb->offload_fwd_mark = 1;

So this line is causing build errors when switchdev is not enabled.
This whole function should probably be wrapped in a check to see if
switchdev support is enabled or not.

> +   nskb->dev = vf->port_netdev;
> +   napi_gro_receive(&q_vector->napi, nskb);
> +   break;
> +   }
> +   }
> +}
> +
> +/**

[PATCH net-next v2 3/6] vxlan: improve validation of address family configuration

2017-04-14 Thread Matthias Schiffer

Address families of source and destination addresses must match, and
changelink operations can't change the address family.

In addition, always use the VXLAN_F_IPV6 to check if a VXLAN device uses
IPv4 or IPv6.

Signed-off-by: Matthias Schiffer 
---

v2: new patch

 drivers/net/vxlan.c | 39 ---
 1 file changed, 28 insertions(+), 11 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index bdd19e5037b0..07f89b037681 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -2459,10 +2459,7 @@ static int vxlan_change_mtu(struct net_device *dev, int 
new_mtu)
struct vxlan_rdst *dst = &vxlan->default_dst;
struct net_device *lowerdev = __dev_get_by_index(vxlan->net,
 dst->remote_ifindex);
-   bool use_ipv6 = false;
-
-   if (dst->remote_ip.sa.sa_family == AF_INET6)
-   use_ipv6 = true;
+   bool use_ipv6 = !!(vxlan->cfg.flags & VXLAN_F_IPV6);
 
/* This check is different than dev->max_mtu, because it looks at
 * the lowerdev->mtu, rather than the static dev->max_mtu
@@ -2871,11 +2868,20 @@ static int vxlan_config_validate(struct net *src_net, 
struct vxlan_config *conf,
}
}
 
-   if (!conf->remote_ip.sa.sa_family)
+   if (!conf->remote_ip.sa.sa_family && !conf->saddr.sa.sa_family) {
+   /* Unless IPv6 is explicitly requested, assume IPv4 */
conf->remote_ip.sa.sa_family = AF_INET;
+   conf->saddr.sa.sa_family = AF_INET;
+   } else if (!conf->remote_ip.sa.sa_family) {
+   conf->remote_ip.sa.sa_family = conf->saddr.sa.sa_family;
+   } else if (!conf->saddr.sa.sa_family) {
+   conf->saddr.sa.sa_family = conf->remote_ip.sa.sa_family;
+   }
+
+   if (conf->saddr.sa.sa_family != conf->remote_ip.sa.sa_family)
+   return -EINVAL;
 
-   if (conf->remote_ip.sa.sa_family == AF_INET6 ||
-   conf->saddr.sa.sa_family == AF_INET6) {
+   if (conf->saddr.sa.sa_family == AF_INET6) {
if (!IS_ENABLED(CONFIG_IPV6))
return -EPFNOSUPPORT;
use_ipv6 = true;
@@ -2932,11 +2938,9 @@ static int vxlan_config_validate(struct net *src_net, 
struct vxlan_config *conf,
continue;
 
if (tmp->cfg.vni == conf->vni &&
-   (tmp->default_dst.remote_ip.sa.sa_family == AF_INET6 ||
-tmp->cfg.saddr.sa.sa_family == AF_INET6) == use_ipv6 &&
tmp->cfg.dst_port == conf->dst_port &&
-   (tmp->cfg.flags & VXLAN_F_RCV_FLAGS) ==
-   (conf->flags & VXLAN_F_RCV_FLAGS)) {
+   (tmp->cfg.flags & (VXLAN_F_RCV_FLAGS | VXLAN_F_IPV6)) ==
+   (conf->flags & (VXLAN_F_RCV_FLAGS | VXLAN_F_IPV6))) {
pr_info("duplicate VNI %u\n", be32_to_cpu(conf->vni));
return -EEXIST;
}
@@ -3069,22 +3073,35 @@ static int vxlan_nl2conf(struct nlattr *tb[], struct 
nlattr *data[],
}
 
if (data[IFLA_VXLAN_GROUP]) {
+   if (changelink && (conf->remote_ip.sa.sa_family != AF_INET))
+   return -EOPNOTSUPP;
+
conf->remote_ip.sin.sin_addr.s_addr = 
nla_get_in_addr(data[IFLA_VXLAN_GROUP]);
+   conf->remote_ip.sa.sa_family = AF_INET;
} else if (data[IFLA_VXLAN_GROUP6]) {
if (!IS_ENABLED(CONFIG_IPV6))
return -EPFNOSUPPORT;
 
+   if (changelink && (conf->remote_ip.sa.sa_family != AF_INET6))
+   return -EOPNOTSUPP;
+
conf->remote_ip.sin6.sin6_addr = 
nla_get_in6_addr(data[IFLA_VXLAN_GROUP6]);
conf->remote_ip.sa.sa_family = AF_INET6;
}
 
if (data[IFLA_VXLAN_LOCAL]) {
+   if (changelink && (conf->saddr.sa.sa_family != AF_INET))
+   return -EOPNOTSUPP;
+
conf->saddr.sin.sin_addr.s_addr = 
nla_get_in_addr(data[IFLA_VXLAN_LOCAL]);
conf->saddr.sa.sa_family = AF_INET;
} else if (data[IFLA_VXLAN_LOCAL6]) {
if (!IS_ENABLED(CONFIG_IPV6))
return -EPFNOSUPPORT;
 
+   if (changelink && (conf->saddr.sa.sa_family != AF_INET6))
+   return -EOPNOTSUPP;
+
/* TODO: respect scope id */
conf->saddr.sin6.sin6_addr = 
nla_get_in6_addr(data[IFLA_VXLAN_LOCAL6]);
conf->saddr.sa.sa_family = AF_INET6;
-- 
2.12.2

[PATCH net-next v2 2/6] vxlan: get rid of redundant vxlan_dev.flags

2017-04-14 Thread Matthias Schiffer

There is no good reason to keep the flags twice in vxlan_dev and
vxlan_config.

Signed-off-by: Matthias Schiffer 
---

v2: new patch

 drivers/net/vxlan.c   | 76 +--
 include/net/vxlan.h   |  1 -
 net/openvswitch/vport-vxlan.c |  4 +--
 3 files changed, 39 insertions(+), 42 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 86471e961708..bdd19e5037b0 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -303,7 +303,7 @@ static int vxlan_fdb_info(struct sk_buff *skb, struct 
vxlan_dev *vxlan,
if (rdst->remote_vni != vxlan->default_dst.remote_vni &&
nla_put_u32(skb, NDA_VNI, be32_to_cpu(rdst->remote_vni)))
goto nla_put_failure;
-   if ((vxlan->flags & VXLAN_F_COLLECT_METADATA) && fdb->vni &&
+   if ((vxlan->cfg.flags & VXLAN_F_COLLECT_METADATA) && fdb->vni &&
nla_put_u32(skb, NDA_SRC_VNI,
be32_to_cpu(fdb->vni)))
goto nla_put_failure;
@@ -417,7 +417,7 @@ static u32 eth_vni_hash(const unsigned char *addr, __be32 
vni)
 static inline struct hlist_head *vxlan_fdb_head(struct vxlan_dev *vxlan,
const u8 *mac, __be32 vni)
 {
-   if (vxlan->flags & VXLAN_F_COLLECT_METADATA)
+   if (vxlan->cfg.flags & VXLAN_F_COLLECT_METADATA)
return &vxlan->fdb_head[eth_vni_hash(mac, vni)];
else
return &vxlan->fdb_head[eth_hash(mac)];
@@ -432,7 +432,7 @@ static struct vxlan_fdb *__vxlan_find_mac(struct vxlan_dev 
*vxlan,
 
hlist_for_each_entry_rcu(f, head, hlist) {
if (ether_addr_equal(mac, f->eth_addr)) {
-   if (vxlan->flags & VXLAN_F_COLLECT_METADATA) {
+   if (vxlan->cfg.flags & VXLAN_F_COLLECT_METADATA) {
if (vni == f->vni)
return f;
} else {
@@ -1266,7 +1266,7 @@ static bool vxlan_set_mac(struct vxlan_dev *vxlan,
 #endif
}
 
-   if ((vxlan->flags & VXLAN_F_LEARN) &&
+   if ((vxlan->cfg.flags & VXLAN_F_LEARN) &&
vxlan_snoop(skb->dev, &saddr, eth_hdr(skb)->h_source, vni))
return false;
 
@@ -1489,7 +1489,7 @@ static int arp_reduce(struct net_device *dev, struct 
sk_buff *skb, __be32 vni)
 
if (netif_rx_ni(reply) == NET_RX_DROP)
dev->stats.rx_dropped++;
-   } else if (vxlan->flags & VXLAN_F_L3MISS) {
+   } else if (vxlan->cfg.flags & VXLAN_F_L3MISS) {
union vxlan_addr ipa = {
.sin.sin_addr.s_addr = tip,
.sin.sin_family = AF_INET,
@@ -1649,7 +1649,7 @@ static int neigh_reduce(struct net_device *dev, struct 
sk_buff *skb, __be32 vni)
if (netif_rx_ni(reply) == NET_RX_DROP)
dev->stats.rx_dropped++;
 
-   } else if (vxlan->flags & VXLAN_F_L3MISS) {
+   } else if (vxlan->cfg.flags & VXLAN_F_L3MISS) {
union vxlan_addr ipa = {
.sin6.sin6_addr = msg->target,
.sin6.sin6_family = AF_INET6,
@@ -1682,7 +1682,7 @@ static bool route_shortcircuit(struct net_device *dev, 
struct sk_buff *skb)
return false;
pip = ip_hdr(skb);
n = neigh_lookup(&arp_tbl, &pip->daddr, dev);
-   if (!n && (vxlan->flags & VXLAN_F_L3MISS)) {
+   if (!n && (vxlan->cfg.flags & VXLAN_F_L3MISS)) {
union vxlan_addr ipa = {
.sin.sin_addr.s_addr = pip->daddr,
.sin.sin_family = AF_INET,
@@ -1703,7 +1703,7 @@ static bool route_shortcircuit(struct net_device *dev, 
struct sk_buff *skb)
return false;
pip6 = ipv6_hdr(skb);
n = neigh_lookup(ipv6_stub->nd_tbl, &pip6->daddr, dev);
-   if (!n && (vxlan->flags & VXLAN_F_L3MISS)) {
+   if (!n && (vxlan->cfg.flags & VXLAN_F_L3MISS)) {
union vxlan_addr ipa = {
.sin6.sin6_addr = pip6->daddr,
.sin6.sin6_family = AF_INET6,
@@ -1977,7 +1977,7 @@ static void vxlan_encap_bypass(struct sk_buff *skb, 
struct vxlan_dev *src_vxlan,
 #endif
}
 
-   if (dst_vxlan->flags & VXLAN_F_LEARN)
+   if (dst_vxlan->cfg.flags & VXLAN_F_LEARN)
vxlan_snoop(skb->dev, &loopback, eth_hdr(skb)->h_source, vni);
 
u64_stats_update_begin(&tx_stats->syncp);
@@ -2015,7 +2015,7 @@ static int encap_bypass_if_local(struct sk_buff *skb, 
struct net_device *dev,
dst_release(dst);
dst_vxlan = vxlan_find_vni(vxlan->net, vni,
   daddr->sa.sa_family, dst_port,
-  vxlan->flags);
+

[PATCH net-next v2 0/6] vxlan: cleanup and IPv6 link-local support

2017-04-14 Thread Matthias Schiffer

Running VXLANs over IPv6 link-local addresses allows to use them as a
drop-in replacement for VLANs, avoiding to allocate additional outer IP
addresses to run the VXLAN over.

Since v1, I have added a lot more consistency checks to the address
configuration, making sure address families and scopes match. To simplify
the implementation, I also did some general refactoring of the
configuration handling in the new first patch of the series.

The second patch is more cleanup; is slightly touches OVS code, so that
list is in CC this time, too.

As in v1, the last two patches actually make VXLAN over IPv6 link-local
work, and allow multiple VXLANs wit the same VNI and port, as long as
link-local addresses on different interfaces are used. As suggested, I now
store in the flags field if the VXLAN uses link-local addresses or not.


Matthias Schiffer (6):
  vxlan: refactor verification and application of configuration
  vxlan: get rid of redundant vxlan_dev.flags
  vxlan: improve validation of address family configuration
  vxlan: check valid combinations of address scopes
  vxlan: fix snooping for link-local IPv6 addresses
  vxlan: allow multiple VXLANs with same VNI for IPv6 link-local
addresses

 drivers/net/vxlan.c   | 411 ++
 include/net/vxlan.h   |   3 +-
 net/openvswitch/vport-vxlan.c |   4 +-
 3 files changed, 263 insertions(+), 155 deletions(-)

-- 
2.12.2

[PATCH net-next v2 4/6] vxlan: check valid combinations of address scopes

2017-04-14 Thread Matthias Schiffer

* Multicast addresses are never valid as local address
* Link-local IPv6 unicast addresses may only be used as remote when the
  local address is link-local as well
* Don't allow link-local IPv6 local/remote addresses without interface

We also store in the flags field if link-local addresses are used for the
follow-up patches that actually make VXLAN over link-local IPv6 work.

Signed-off-by: Matthias Schiffer 
---

v2: was "vxlan: don't allow link-local IPv6 local/remote addresses without
interface" before. v2 does a lot more checks and adds the
VXLAN_F_IPV6_LINKLOCAL flag.

 drivers/net/vxlan.c | 35 +++
 include/net/vxlan.h |  2 ++
 2 files changed, 37 insertions(+)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 07f89b037681..95a71546e8f2 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -2881,11 +2881,39 @@ static int vxlan_config_validate(struct net *src_net, 
struct vxlan_config *conf,
if (conf->saddr.sa.sa_family != conf->remote_ip.sa.sa_family)
return -EINVAL;
 
+   if (vxlan_addr_multicast(&conf->saddr))
+   return -EINVAL;
+
if (conf->saddr.sa.sa_family == AF_INET6) {
if (!IS_ENABLED(CONFIG_IPV6))
return -EPFNOSUPPORT;
use_ipv6 = true;
conf->flags |= VXLAN_F_IPV6;
+
+   if (!(conf->flags & VXLAN_F_COLLECT_METADATA)) {
+   int local_type =
+   ipv6_addr_type(&conf->saddr.sin6.sin6_addr);
+   int remote_type =
+   ipv6_addr_type(&conf->remote_ip.sin6.sin6_addr);
+
+   if (local_type & IPV6_ADDR_LINKLOCAL) {
+   if (!(remote_type & IPV6_ADDR_LINKLOCAL) &&
+   (remote_type != IPV6_ADDR_ANY)) {
+   pr_info("invalid combination of address 
scopes\n");
+   return -EINVAL;
+   }
+
+   conf->flags |= VXLAN_F_IPV6_LINKLOCAL;
+   } else {
+   if (remote_type ==
+   (IPV6_ADDR_UNICAST | IPV6_ADDR_LINKLOCAL)) {
+   pr_info("invalid combination of address 
scopes\n");
+   return -EINVAL;
+   }
+
+   conf->flags &= ~VXLAN_F_IPV6_LINKLOCAL;
+   }
+   }
}
 
if (conf->label && !use_ipv6) {
@@ -2920,6 +2948,13 @@ static int vxlan_config_validate(struct net *src_net, 
struct vxlan_config *conf,
return -EINVAL;
}
 
+#if IS_ENABLED(CONFIG_IPV6)
+   if (conf->flags & VXLAN_F_IPV6_LINKLOCAL) {
+   pr_info("link-local local/remote addresses require 
interface to be specified\n");
+   return -EINVAL;
+   }
+#endif
+
*lower = NULL;
}
 
diff --git a/include/net/vxlan.h b/include/net/vxlan.h
index 479bb75789ea..b816a0a6686e 100644
--- a/include/net/vxlan.h
+++ b/include/net/vxlan.h
@@ -258,6 +258,7 @@ struct vxlan_dev {
 #define VXLAN_F_REMCSUM_NOPARTIAL  0x1000
 #define VXLAN_F_COLLECT_METADATA   0x2000
 #define VXLAN_F_GPE0x4000
+#define VXLAN_F_IPV6_LINKLOCAL 0x8000
 
 /* Flags that are used in the receive path. These flags must match in
  * order for a socket to be shareable
@@ -272,6 +273,7 @@ struct vxlan_dev {
 /* Flags that can be set together with VXLAN_F_GPE. */
 #define VXLAN_F_ALLOWED_GPE(VXLAN_F_GPE |  \
 VXLAN_F_IPV6 | \
+VXLAN_F_IPV6_LINKLOCAL |   \
 VXLAN_F_UDP_ZERO_CSUM_TX | \
 VXLAN_F_UDP_ZERO_CSUM6_TX |\
 VXLAN_F_UDP_ZERO_CSUM6_RX |\
-- 
2.12.2

[PATCH net-next v2 6/6] vxlan: allow multiple VXLANs with same VNI for IPv6 link-local addresses

2017-04-14 Thread Matthias Schiffer

As link-local addresses are only valid for a single interface, we can allow
to use the same VNI for multiple independent VXLANs, as long as the used
interfaces are distinct. This way, VXLANs can always be used as a drop-in
replacement for VLANs with greater ID space.

This also extends VNI lookup to respect the ifindex when link-local IPv6
addresses are used, so using the same VNI on multiple interfaces can
actually work.

Signed-off-by: Matthias Schiffer 
---

v2: use VXLAN_F_IPV6_LINKLOCAL in vxlan_vs_find_vni; rebase.

 drivers/net/vxlan.c | 73 ++---
 1 file changed, 47 insertions(+), 26 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 2995b57a1551..9054245f0770 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -224,7 +224,8 @@ static struct vxlan_sock *vxlan_find_sock(struct net *net, 
sa_family_t family,
return NULL;
 }
 
-static struct vxlan_dev *vxlan_vs_find_vni(struct vxlan_sock *vs, __be32 vni)
+static struct vxlan_dev *vxlan_vs_find_vni(struct vxlan_sock *vs, int ifindex,
+  __be32 vni)
 {
struct vxlan_dev *vxlan;
 
@@ -233,17 +234,27 @@ static struct vxlan_dev *vxlan_vs_find_vni(struct 
vxlan_sock *vs, __be32 vni)
vni = 0;
 
hlist_for_each_entry_rcu(vxlan, vni_head(vs, vni), hlist) {
-   if (vxlan->default_dst.remote_vni == vni)
-   return vxlan;
+   if (vxlan->default_dst.remote_vni != vni)
+   continue;
+
+   if (IS_ENABLED(CONFIG_IPV6)) {
+   const struct vxlan_config *cfg = &vxlan->cfg;
+
+   if ((cfg->flags & VXLAN_F_IPV6_LINKLOCAL) &&
+   cfg->remote_ifindex != ifindex)
+   continue;
+   }
+
+   return vxlan;
}
 
return NULL;
 }
 
 /* Look up VNI in a per net namespace table */
-static struct vxlan_dev *vxlan_find_vni(struct net *net, __be32 vni,
-   sa_family_t family, __be16 port,
-   u32 flags)
+static struct vxlan_dev *vxlan_find_vni(struct net *net, int ifindex,
+   __be32 vni, sa_family_t family,
+   __be16 port, u32 flags)
 {
struct vxlan_sock *vs;
 
@@ -251,7 +262,7 @@ static struct vxlan_dev *vxlan_find_vni(struct net *net, 
__be32 vni,
if (!vs)
return NULL;
 
-   return vxlan_vs_find_vni(vs, vni);
+   return vxlan_vs_find_vni(vs, ifindex, vni);
 }
 
 /* Fill in neighbour message in skbuff. */
@@ -1342,7 +1353,7 @@ static int vxlan_rcv(struct sock *sk, struct sk_buff *skb)
 
vni = vxlan_vni(vxlan_hdr(skb)->vx_vni);
 
-   vxlan = vxlan_vs_find_vni(vs, vni);
+   vxlan = vxlan_vs_find_vni(vs, skb->dev->ifindex, vni);
if (!vxlan)
goto drop;
 
@@ -2006,8 +2017,10 @@ static void vxlan_encap_bypass(struct sk_buff *skb, 
struct vxlan_dev *src_vxlan,
 }
 
 static int encap_bypass_if_local(struct sk_buff *skb, struct net_device *dev,
-struct vxlan_dev *vxlan, union vxlan_addr 
*daddr,
-__be16 dst_port, __be32 vni, struct dst_entry 
*dst,
+struct vxlan_dev *vxlan,
+union vxlan_addr *daddr,
+__be16 dst_port, int dst_ifindex, __be32 vni,
+struct dst_entry *dst,
 u32 rt_flags)
 {
 #if IS_ENABLED(CONFIG_IPV6)
@@ -2023,7 +2036,7 @@ static int encap_bypass_if_local(struct sk_buff *skb, 
struct net_device *dev,
struct vxlan_dev *dst_vxlan;
 
dst_release(dst);
-   dst_vxlan = vxlan_find_vni(vxlan->net, vni,
+   dst_vxlan = vxlan_find_vni(vxlan->net, dst_ifindex, vni,
   daddr->sa.sa_family, dst_port,
   vxlan->cfg.flags);
if (!dst_vxlan) {
@@ -2055,6 +2068,7 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct 
net_device *dev,
struct dst_entry *ndst = NULL;
__be32 vni, label;
__u8 tos, ttl;
+   int ifindex;
int err;
u32 flags = vxlan->cfg.flags;
bool udp_sum = false;
@@ -2075,6 +2089,7 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct 
net_device *dev,
 
dst_port = rdst->remote_port ? rdst->remote_port : 
vxlan->cfg.dst_port;
vni = (rdst->remote_vni) ? : default_vni;
+   ifindex = rdst->remote_ifindex;
local_ip = vxlan->cfg.saddr;
dst_cache = &rdst->dst_cache;
md->gbp = skb->mark;
@@ -2108,6 +2123,7 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct 
net_device *dev,

[PATCH net-next v2 5/6] vxlan: fix snooping for link-local IPv6 addresses

2017-04-14 Thread Matthias Schiffer

If VXLAN is run over link-local IPv6 addresses, it is necessary to store
the ifindex in the FDB entries. Otherwise, the used interface is undefined
and unicast communication will most likely fail.

Signed-off-by: Matthias Schiffer 
---

v2: use u32 instead of __u32

 drivers/net/vxlan.c | 20 +++-
 1 file changed, 15 insertions(+), 5 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 95a71546e8f2..2995b57a1551 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -941,16 +941,24 @@ static int vxlan_fdb_dump(struct sk_buff *skb, struct 
netlink_callback *cb,
  */
 static bool vxlan_snoop(struct net_device *dev,
union vxlan_addr *src_ip, const u8 *src_mac,
-   __be32 vni)
+   u32 src_ifindex, __be32 vni)
 {
struct vxlan_dev *vxlan = netdev_priv(dev);
struct vxlan_fdb *f;
+   u32 ifindex = 0;
+
+#if IS_ENABLED(CONFIG_IPV6)
+   if (src_ip->sa.sa_family == AF_INET6 &&
+   (ipv6_addr_type(&src_ip->sin6.sin6_addr) & IPV6_ADDR_LINKLOCAL))
+   ifindex = src_ifindex;
+#endif
 
f = vxlan_find_mac(vxlan, src_mac, vni);
if (likely(f)) {
struct vxlan_rdst *rdst = first_remote_rcu(f);
 
-   if (likely(vxlan_addr_equal(&rdst->remote_ip, src_ip)))
+   if (likely(vxlan_addr_equal(&rdst->remote_ip, src_ip) &&
+  rdst->remote_ifindex == ifindex))
return false;
 
/* Don't migrate static entries, drop packets */
@@ -977,7 +985,7 @@ static bool vxlan_snoop(struct net_device *dev,
 vxlan->cfg.dst_port,
 vni,
 vxlan->default_dst.remote_vni,
-0, NTF_SELF);
+ifindex, NTF_SELF);
spin_unlock(&vxlan->hash_lock);
}
 
@@ -1246,6 +1254,7 @@ static bool vxlan_set_mac(struct vxlan_dev *vxlan,
  struct sk_buff *skb, __be32 vni)
 {
union vxlan_addr saddr;
+   u32 ifindex = skb->dev->ifindex;
 
skb_reset_mac_header(skb);
skb->protocol = eth_type_trans(skb, vxlan->dev);
@@ -1267,7 +1276,7 @@ static bool vxlan_set_mac(struct vxlan_dev *vxlan,
}
 
if ((vxlan->cfg.flags & VXLAN_F_LEARN) &&
-   vxlan_snoop(skb->dev, &saddr, eth_hdr(skb)->h_source, vni))
+   vxlan_snoop(skb->dev, &saddr, eth_hdr(skb)->h_source, ifindex, vni))
return false;
 
return true;
@@ -1978,7 +1987,8 @@ static void vxlan_encap_bypass(struct sk_buff *skb, 
struct vxlan_dev *src_vxlan,
}
 
if (dst_vxlan->cfg.flags & VXLAN_F_LEARN)
-   vxlan_snoop(skb->dev, &loopback, eth_hdr(skb)->h_source, vni);
+   vxlan_snoop(skb->dev, &loopback, eth_hdr(skb)->h_source, 0,
+   vni);
 
u64_stats_update_begin(&tx_stats->syncp);
tx_stats->tx_packets++;
-- 
2.12.2

[PATCH net-next v2 1/6] vxlan: refactor verification and application of configuration

2017-04-14 Thread Matthias Schiffer

The vxlan_dev_configure function was mixing validation and application of
the vxlan configuration; this could easily lead to bugs with the changelink
operation, as it was hard to see if the function wcould return an error
after parts of the configuration had already been applied.

This commit splits validation and application out of vxlan_dev_configure as
separate functions to make it clearer where error returns are allowed and
where the vxlan_dev or net_device may be configured.

In addition, some validation is moved to vxlan_validate and some
initialization to vxlan_setup where this improves grouping of similar
settings.

Finally, this also fixes two actual bugs:

* if set, conf->mtu would overwrite dev->mtu in each changelink operation,
  reverting other changes of dev->mtu
* the "if (!conf->dst_port)" branch would never be run, as conf->dst_port
  was set in vxlan_setup before. This caused VXLAN-GPE to use the same
  default port as other VXLAN sockets instead of the intended IANA-assigned
  4790.

Signed-off-by: Matthias Schiffer 
---

v2: new patch

 drivers/net/vxlan.c | 202 +---
 1 file changed, 114 insertions(+), 88 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index ebc98bb17a51..86471e961708 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -2598,6 +2598,10 @@ static void vxlan_setup(struct net_device *dev)
netif_keep_dst(dev);
dev->priv_flags |= IFF_NO_QUEUE;
 
+   /* MTU range: 68 - 65535 */
+   dev->min_mtu = ETH_MIN_MTU;
+   dev->max_mtu = ETH_MAX_MTU;
+
INIT_LIST_HEAD(&vxlan->next);
spin_lock_init(&vxlan->hash_lock);
 
@@ -2605,9 +2609,8 @@ static void vxlan_setup(struct net_device *dev)
vxlan->age_timer.function = vxlan_cleanup;
vxlan->age_timer.data = (unsigned long) vxlan;
 
-   vxlan->cfg.dst_port = htons(vxlan_port);
-
vxlan->dev = dev;
+   vxlan->net = dev_net(dev);
 
gro_cells_init(&vxlan->gro_cells, dev);
 
@@ -2676,11 +2679,19 @@ static int vxlan_validate(struct nlattr *tb[], struct 
nlattr *data[])
}
}
 
+   if (tb[IFLA_MTU]) {
+   u32 mtu = nla_get_u32(data[IFLA_MTU]);
+
+   if (mtu < ETH_MIN_MTU || mtu > ETH_MAX_MTU)
+   return -EINVAL;
+   }
+
if (!data)
return -EINVAL;
 
if (data[IFLA_VXLAN_ID]) {
-   __u32 id = nla_get_u32(data[IFLA_VXLAN_ID]);
+   u32 id = nla_get_u32(data[IFLA_VXLAN_ID]);
+
if (id >= VXLAN_N_VID)
return -ERANGE;
}
@@ -2839,55 +2850,36 @@ static int vxlan_sock_add(struct vxlan_dev *vxlan)
return ret;
 }
 
-static int vxlan_dev_configure(struct net *src_net, struct net_device *dev,
-  struct vxlan_config *conf,
-  bool changelink)
+static int vxlan_config_validate(struct net *src_net, struct vxlan_config 
*conf,
+struct net_device **lower,
+struct vxlan_dev *old)
 {
struct vxlan_net *vn = net_generic(src_net, vxlan_net_id);
-   struct vxlan_dev *vxlan = netdev_priv(dev), *tmp;
-   struct vxlan_rdst *dst = &vxlan->default_dst;
-   unsigned short needed_headroom = ETH_HLEN;
+   struct vxlan_dev *tmp;
bool use_ipv6 = false;
-   __be16 default_port = vxlan->cfg.dst_port;
-   struct net_device *lowerdev = NULL;
 
-   if (!changelink) {
-   if (conf->flags & VXLAN_F_GPE) {
-   /* For now, allow GPE only together with
-* COLLECT_METADATA. This can be relaxed later; in such
-* case, the other side of the PtP link will have to be
-* provided.
-*/
-   if ((conf->flags & ~VXLAN_F_ALLOWED_GPE) ||
-   !(conf->flags & VXLAN_F_COLLECT_METADATA)) {
-   pr_info("unsupported combination of 
extensions\n");
-   return -EINVAL;
-   }
-   vxlan_raw_setup(dev);
-   } else {
-   vxlan_ether_setup(dev);
+   if (conf->flags & VXLAN_F_GPE) {
+   /* For now, allow GPE only together with
+* COLLECT_METADATA. This can be relaxed later; in such
+* case, the other side of the PtP link will have to be
+* provided.
+*/
+   if ((conf->flags & ~VXLAN_F_ALLOWED_GPE) ||
+   !(conf->flags & VXLAN_F_COLLECT_METADATA)) {
+   pr_info("unsupported combination of extensions\n");
+   return -EINVAL;
}
-
-   /* MTU range: 68 - 65535 */
-   dev->min_mtu = ETH_MIN_MTU;
-   dev->max_mtu = ETH_MAX_MTU;
-

Re: How to debug DMAR errors?

2017-04-14 Thread Ben Greear




On 04/14/2017 09:24 AM, Alexander Duyck wrote:

On Fri, Apr 14, 2017 at 9:19 AM, Ben Greear  wrote:



On 04/14/2017 08:45 AM, Alexander Duyck wrote:


On Thu, Apr 13, 2017 at 11:12 AM, Ben Greear 
wrote:


Hello,

I have been seeing a regular occurrence of DMAR errors, looking something
like this when testing my ath10k driver/firmware under some specific
loads
(maximum receive of 512 byte frames in AP mode):

DMAR: DRHD: handling fault status reg 3
DMAR: [DMA Read] Request device [05:00.0] fault addr fd99f000 [fault
reason
06] PTE Read access is not set
ath10k_pci :05:00.0: firmware crashed! (uuid
594b1393-ae35-42b5-9dec-74ff0c6791ff)

So, I am wondering if there is any way I can get more information about
what
this fd99f000 address
is?

Once this problem hits, the entire OS locks hard (not even sysrq-boot
will
do anything),
so I guess I would need the DMAR logic to print out more info on that
address somehow.

Thanks,
Ben



There isn't much more info to give you. The problem is that the device
at 5:00.0 attempted to read at fd99f000 even though it didn't have
permissions. In response this should trigger a PCI Master Abort
message to that function. It looks like the firmware for the device
doesn't handle that and so that is likely why things got hung.

Really you would need to interrogate the ath10k_pci to see if there
is/was a mapping somewhere for that address and what it was supposed
to be used for.



I'm working on a hook in DMAR logic to call into ath10k_pci when the
error is seen, so the ath10k can dump debug info, including recent DMA
addresses.

My code is an awful hack so far, but if someone could add a clean way to
register
DMAR error callbacks, I think that would be very welcome.  It might could
tie into
automated dma map/unmap debugging logic, and at the least, someone could
write custom debugging callbacks
for the driver(s) in question.

Thanks,
Ben



You might look at coding up something to add pci_error_handlers for
the pci_driver in the ath10k_pci driver. The PCI Master Abort should
trigger an error that you could then capture in the driver and handle
at least dumping it via your own implementation of the error handlers.
If nothing else I suspect there are probably some sort of descriptor
rings you could probably dump. I'm suspecting this is some sort of Tx
issue since the problem was a read fault, but I suppose there are
other paths in the driver that might trigger DMA read requests.


This is a thick firmware driver, so the firmware could also be screwing up
and accessing something it should not.  There are some existing work-arounds
in it to deal with sketchy behaviour already, maybe more are needed.

Anyway, once I added the debugging code, I didn't see it crash again, so
might be a while before I know more.

Thanks,
Ben



- Alex



--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com

Re: [PATCH v4 net-next RFC] net: Generic XDP

2017-04-14 Thread William Tu

On Thu, Apr 13, 2017 at 9:09 AM, David Miller  wrote:
>
[snip]

> +static u32 netif_receive_generic_xdp(struct sk_buff *skb,
> +struct bpf_prog *xdp_prog)
> +{
> +   struct xdp_buff xdp;
> +   u32 act = XDP_DROP;
> +   void *orig_data;
> +   int hlen, off;
> +
> +   if (skb_linearize(skb))
> +   goto do_drop;
> +
> +   /* The XDP program wants to see the packet starting at the MAC
> +* header.
> +*/
> +   hlen = skb_headlen(skb) + skb->mac_len;
> +   xdp.data = skb->data - skb->mac_len;
> +   xdp.data_end = xdp.data + hlen;
> +   xdp.data_hard_start = xdp.data - skb_headroom(skb);
> +   orig_data = xdp.data;
> +
> +   act = bpf_prog_run_xdp(xdp_prog, &xdp);
> +
> +   off = xdp.data - orig_data;

should we do "orig_data - xdp.data" instead?
The 'off' might be < 0, when pushing new header.

> +   if (off)
> +   __skb_push(skb, off);

When doing encapsulation by calling xdp_adjust_head(skb, offset), the
offset is < 0 in order to make extra room in the front ; so xdp.data <
orig_data, off < 0.

But if we are decapsulating protocol, then off > 0, and maybe we
should call __skb_pull()?

Thanks,
William

Re: How to debug DMAR errors?

2017-04-14 Thread Alexander Duyck

On Fri, Apr 14, 2017 at 9:19 AM, Ben Greear  wrote:
>
>
> On 04/14/2017 08:45 AM, Alexander Duyck wrote:
>>
>> On Thu, Apr 13, 2017 at 11:12 AM, Ben Greear 
>> wrote:
>>>
>>> Hello,
>>>
>>> I have been seeing a regular occurrence of DMAR errors, looking something
>>> like this when testing my ath10k driver/firmware under some specific
>>> loads
>>> (maximum receive of 512 byte frames in AP mode):
>>>
>>> DMAR: DRHD: handling fault status reg 3
>>> DMAR: [DMA Read] Request device [05:00.0] fault addr fd99f000 [fault
>>> reason
>>> 06] PTE Read access is not set
>>> ath10k_pci :05:00.0: firmware crashed! (uuid
>>> 594b1393-ae35-42b5-9dec-74ff0c6791ff)
>>>
>>> So, I am wondering if there is any way I can get more information about
>>> what
>>> this fd99f000 address
>>> is?
>>>
>>> Once this problem hits, the entire OS locks hard (not even sysrq-boot
>>> will
>>> do anything),
>>> so I guess I would need the DMAR logic to print out more info on that
>>> address somehow.
>>>
>>> Thanks,
>>> Ben
>>
>>
>> There isn't much more info to give you. The problem is that the device
>> at 5:00.0 attempted to read at fd99f000 even though it didn't have
>> permissions. In response this should trigger a PCI Master Abort
>> message to that function. It looks like the firmware for the device
>> doesn't handle that and so that is likely why things got hung.
>>
>> Really you would need to interrogate the ath10k_pci to see if there
>> is/was a mapping somewhere for that address and what it was supposed
>> to be used for.
>
>
> I'm working on a hook in DMAR logic to call into ath10k_pci when the
> error is seen, so the ath10k can dump debug info, including recent DMA
> addresses.
>
> My code is an awful hack so far, but if someone could add a clean way to
> register
> DMAR error callbacks, I think that would be very welcome.  It might could
> tie into
> automated dma map/unmap debugging logic, and at the least, someone could
> write custom debugging callbacks
> for the driver(s) in question.
>
> Thanks,
> Ben
>

You might look at coding up something to add pci_error_handlers for
the pci_driver in the ath10k_pci driver. The PCI Master Abort should
trigger an error that you could then capture in the driver and handle
at least dumping it via your own implementation of the error handlers.
If nothing else I suspect there are probably some sort of descriptor
rings you could probably dump. I'm suspecting this is some sort of Tx
issue since the problem was a read fault, but I suppose there are
other paths in the driver that might trigger DMA read requests.

- Alex

Re: How to debug DMAR errors?

2017-04-14 Thread Ben Greear




On 04/14/2017 08:45 AM, Alexander Duyck wrote:

On Thu, Apr 13, 2017 at 11:12 AM, Ben Greear  wrote:

Hello,

I have been seeing a regular occurrence of DMAR errors, looking something
like this when testing my ath10k driver/firmware under some specific loads
(maximum receive of 512 byte frames in AP mode):

DMAR: DRHD: handling fault status reg 3
DMAR: [DMA Read] Request device [05:00.0] fault addr fd99f000 [fault reason
06] PTE Read access is not set
ath10k_pci :05:00.0: firmware crashed! (uuid
594b1393-ae35-42b5-9dec-74ff0c6791ff)

So, I am wondering if there is any way I can get more information about what
this fd99f000 address
is?

Once this problem hits, the entire OS locks hard (not even sysrq-boot will
do anything),
so I guess I would need the DMAR logic to print out more info on that
address somehow.

Thanks,
Ben


There isn't much more info to give you. The problem is that the device
at 5:00.0 attempted to read at fd99f000 even though it didn't have
permissions. In response this should trigger a PCI Master Abort
message to that function. It looks like the firmware for the device
doesn't handle that and so that is likely why things got hung.

Really you would need to interrogate the ath10k_pci to see if there
is/was a mapping somewhere for that address and what it was supposed
to be used for.


I'm working on a hook in DMAR logic to call into ath10k_pci when the
error is seen, so the ath10k can dump debug info, including recent DMA
addresses.

My code is an awful hack so far, but if someone could add a clean way to 
register
DMAR error callbacks, I think that would be very welcome.  It might could tie 
into
automated dma map/unmap debugging logic, and at the least, someone could write 
custom debugging callbacks
for the driver(s) in question.

Thanks,
Ben



- Alex



--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com

Re: [PATCH 10/22] staging: unisys: visorbus: Make use of the new sg_map helper function

2017-04-14 Thread Logan Gunthorpe

Great, thanks!

Logan

On 14/04/17 10:07 AM, Kershner, David A wrote:
> Can you add Acked-by for this patch? 
> 
> Acked-by: David Kershner 
> 
> Tested on s-Par and no problems. 
> 
> Thanks,
> David Kershner
> 
>> ---
>>  drivers/staging/unisys/visorhba/visorhba_main.c | 12 +++-
>>  1 file changed, 7 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/staging/unisys/visorhba/visorhba_main.c
>> b/drivers/staging/unisys/visorhba/visorhba_main.c
>> index 0ce92c8..2d8c8bc 100644
>> --- a/drivers/staging/unisys/visorhba/visorhba_main.c
>> +++ b/drivers/staging/unisys/visorhba/visorhba_main.c
>> @@ -842,7 +842,6 @@ do_scsi_nolinuxstat(struct uiscmdrsp *cmdrsp, struct
>> scsi_cmnd *scsicmd)
>>  struct scatterlist *sg;
>>  unsigned int i;
>>  char *this_page;
>> -char *this_page_orig;
>>  int bufind = 0;
>>  struct visordisk_info *vdisk;
>>  struct visorhba_devdata *devdata;
>> @@ -869,11 +868,14 @@ do_scsi_nolinuxstat(struct uiscmdrsp *cmdrsp,
>> struct scsi_cmnd *scsicmd)
>>
>>  sg = scsi_sglist(scsicmd);
>>  for (i = 0; i < scsi_sg_count(scsicmd); i++) {
>> -this_page_orig = kmap_atomic(sg_page(sg + i));
>> -this_page = (void *)((unsigned long)this_page_orig |
>> - sg[i].offset);
>> +this_page = sg_map(sg + i, SG_KMAP_ATOMIC);
>> +if (IS_ERR(this_page)) {
>> +scsicmd->result = DID_ERROR << 16;
>> +return;
>> +}
>> +
>>  memcpy(this_page, buf + bufind, sg[i].length);
>> -kunmap_atomic(this_page_orig);
>> +sg_unmap(sg + i, this_page, SG_KMAP_ATOMIC);
>>  }
>>  } else {
>>  devdata = (struct visorhba_devdata *)scsidev->host-
>>> hostdata;
>> --
>> 2.1.4
>

Re: [PATCH net-next 1/1 v3] drivers: net: rmnet: Initial implementation

2017-04-14 Thread Stephen Hemminger

On Thu, 13 Apr 2017 23:05:29 -0600
Subash Abhinov Kasiviswanathan  wrote:

> RmNet driver provides a transport agnostic MAP (multiplexing and
> aggregation protocol) support in embedded module. Module provides
> virtual network devices which can be attached to any IP-mode
> physical device. This will be used to provide all MAP functionality
> on future hardware in a single consistent location.
> 
> Signed-off-by: Subash Abhinov Kasiviswanathan 

> 
> diff --git a/Documentation/networking/rmnet.txt 
> b/Documentation/networking/rmnet.txt
> new file mode 100644
> index 000..58d3ea2
> --- /dev/null
> +++ b/Documentation/networking/rmnet.txt
>
...

> +3. Userspace configuration
> +
> +rmnet userspace configuration is done through netlink library librmnetctl
> +and command line utility rmnetcli. Utility is hosted in codeaurora forum git.
> +The driver uses rtnl_link_ops for communication.
> +
> +https://source.codeaurora.org/quic/la/platform/vendor/qcom-opensource/\
> +dataservices/tree/rmnetctl

Don't split URL better to have long line.


> diff --git a/drivers/net/Makefile b/drivers/net/Makefile
> index 98ed4d9..29b3945 100644
> --- a/drivers/net/Makefile
> +++ b/drivers/net/Makefile
> @@ -74,3 +74,4 @@ obj-$(CONFIG_HYPERV_NET) += hyperv/
>  obj-$(CONFIG_NTB_NETDEV) += ntb_netdev.o
>  
>  obj-$(CONFIG_FUJITSU_ES) += fjes/
> +obj-$(CONFIG_RMNET) += rmnet/
> diff --git a/drivers/net/rmnet/Kconfig b/drivers/net/rmnet/Kconfig

Since this is Qualcomm and Ethernet specific, maybe better to put
in drivers/net/ethernet/qualcom/rmnet


> new file mode 100644
> index 000..63cd477
> --- /dev/null
> +++ b/drivers/net/rmnet/Kconfig
> @@ -0,0 +1,23 @@
> +#
> +# RMNET MAP driver
> +#
> +
> +menuconfig RMNET
> + depends on NETDEVICES
> + bool "RmNet MAP driver"
> + default n
> + ---help---
> +   If you say Y here, then the rmnet module will be statically
> +   compiled into the kernel. The rmnet module provides MAP
> +   functionality for embedded and bridged traffic.
> +if RMNET
> +
> +config RMNET_DEBUG
> + bool "RmNet Debug Logging"
> + default n
> + ---help---
> +   Say Y here if you want RmNet to be able to log packets in main
> +   system log. This should not be enabled on production builds as it can
> +   impact system performance. Note that simply enabling it here will not
> +   enable the logging; it must be enabled at run-time as well.


Please use network device standard debug mechanism.
netif_msg_XXX

> +endif # RMNET
> diff --git a/drivers/net/rmnet/Makefile b/drivers/net/rmnet/Makefile
> new file mode 100644
> index 000..2b6c9cf
> --- /dev/null
> +++ b/drivers/net/rmnet/Makefile
> @@ -0,0 +1,14 @@
> +#
> +# Makefile for the RMNET module
> +#
> +
> +rmnet-y   := rmnet_main.o
> +rmnet-y   += rmnet_config.o
> +rmnet-y   += rmnet_vnd.o
> +rmnet-y   += rmnet_handlers.o
> +rmnet-y   += rmnet_map_data.o
> +rmnet-y   += rmnet_map_command.o
> +rmnet-y   += rmnet_stats.o
> +obj-$(CONFIG_RMNET) += rmnet.o
> +
> +CFLAGS_rmnet_main.o := -I$(src)
> diff --git a/drivers/net/rmnet/rmnet_config.c 
> b/drivers/net/rmnet/rmnet_config.c
> new file mode 100644
> index 000..a4bc76b
> --- /dev/null
> +++ b/drivers/net/rmnet/rmnet_config.c
> @@ -0,0 +1,592 @@
> +/* Copyright (c) 2013-2017, The Linux Foundation. All rights reserved.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 and
> + * only version 2 as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * RMNET configuration engine
> + *
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include "rmnet_config.h"
> +#include "rmnet_handlers.h"
> +#include "rmnet_vnd.h"
> +#include "rmnet_private.h"
> +
> +RMNET_LOG_MODULE(RMNET_LOGMASK_CONFIG);
> +
> +/* Local Definitions and Declarations */
> +#define RMNET_LOCAL_LOGICAL_ENDPOINT -1
> +
> +/* _rmnet_is_physical_endpoint_associated() - Determines if device is 
> associated
> + * @dev:  Device to get check
> + *
> + * Compares device rx_handler callback pointer against known function
> + */
> +static inline int _rmnet_is_physical_endpoint_associated(struct net_device 
> *dev)
> +{
> + rx_handler_func_t *rx_handler;
> +
> + rx_handler = rcu_dereference(dev->rx_handler);
> +
> + if (rx_handler == rmnet_rx_handler)
> + return 1;
> + else
> + return 0;
> +}

Could just be:

static inline int _rmnet_is_physical_endpoint_associated(const struct 
net_device *dev)
{
rx_handler_func_t *rx_handler
= rcu_deref

Re: [PATCH 04/22] target: Make use of the new sg_map function at 16 call sites (fwd)

2017-04-14 Thread Logan Gunthorpe

Thanks Julia. I missed that and I'll fix it in my series.

Logan

On 14/04/17 09:19 AM, Julia Lawall wrote:
> It looks like &udev->cmdr_lock should be released at line 512 if it has
> not been released otherwise.  The lock was taken at line 438.
> 
> julia
> 
> -- Forwarded message --
> Date: Fri, 14 Apr 2017 22:21:44 +0800
> From: kbuild test robot 
> To: kbu...@01.org
> Cc: Julia Lawall 
> Subject: Re: [PATCH 04/22] target: Make use of the new sg_map function at 16
> call sites
> 
> Hi Logan,
> 
> [auto build test WARNING on scsi/for-next]
> [also build test WARNING on v4.11-rc6]
> [cannot apply to next-20170413]
> [if your patch is applied to the wrong git tree, please drop us a note to 
> help improve the system]
> 
> url:
> https://github.com/0day-ci/linux/commits/Logan-Gunthorpe/Introduce-common-scatterlist-map-function/20170414-142518
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi.git for-next
> :: branch date: 8 hours ago
> :: commit date: 8 hours ago
> 
>>> drivers/target/target_core_user.c:512:2-8: preceding lock on line 438
>drivers/target/target_core_user.c:512:2-8: preceding lock on line 471
> 
> git remote add linux-review https://github.com/0day-ci/linux
> git remote update linux-review
> git checkout 78082134e7afdc59d744eb8d2def5c588e89c378
> vim +512 drivers/target/target_core_user.c
> 
> 7c9e7a6f Andy Grover  2014-10-01  432 
> sizeof(struct tcmu_cmd_entry));
> 7c9e7a6f Andy Grover  2014-10-01  433 command_size = base_command_size
> 7c9e7a6f Andy Grover  2014-10-01  434 + 
> round_up(scsi_command_size(se_cmd->t_task_cdb), TCMU_OP_ALIGN_SIZE);
> 7c9e7a6f Andy Grover  2014-10-01  435
> 7c9e7a6f Andy Grover  2014-10-01  436 WARN_ON(command_size & 
> (TCMU_OP_ALIGN_SIZE-1));
> 7c9e7a6f Andy Grover  2014-10-01  437
> 7c9e7a6f Andy Grover  2014-10-01 @438 spin_lock_irq(&udev->cmdr_lock);
> 7c9e7a6f Andy Grover  2014-10-01  439
> 7c9e7a6f Andy Grover  2014-10-01  440 mb = udev->mb_addr;
> 7c9e7a6f Andy Grover  2014-10-01  441 cmd_head = mb->cmd_head % 
> udev->cmdr_size; /* UAM */
> 26418649 Sheng Yang   2016-02-26  442 data_length = 
> se_cmd->data_length;
> 26418649 Sheng Yang   2016-02-26  443 if (se_cmd->se_cmd_flags & 
> SCF_BIDI) {
> 26418649 Sheng Yang   2016-02-26  444 
> BUG_ON(!(se_cmd->t_bidi_data_sg && se_cmd->t_bidi_data_nents));
> 26418649 Sheng Yang   2016-02-26  445 data_length += 
> se_cmd->t_bidi_data_sg->length;
> 26418649 Sheng Yang   2016-02-26  446 }
> 554617b2 Andy Grover  2016-08-25  447 if ((command_size > 
> (udev->cmdr_size / 2)) ||
> 554617b2 Andy Grover  2016-08-25  448 data_length > 
> udev->data_size) {
> 554617b2 Andy Grover  2016-08-25  449 pr_warn("TCMU: Request 
> of size %zu/%zu is too big for %u/%zu "
> 3d9b9555 Andy Grover  2016-08-25  450 "cmd ring/data 
> area\n", command_size, data_length,
> 7c9e7a6f Andy Grover  2014-10-01  451 
> udev->cmdr_size, udev->data_size);
> 554617b2 Andy Grover  2016-08-25  452 
> spin_unlock_irq(&udev->cmdr_lock);
> 554617b2 Andy Grover  2016-08-25  453 return 
> TCM_INVALID_CDB_FIELD;
> 554617b2 Andy Grover  2016-08-25  454 }
> 7c9e7a6f Andy Grover  2014-10-01  455
> 26418649 Sheng Yang   2016-02-26  456 while 
> (!is_ring_space_avail(udev, command_size, data_length)) {
> 7c9e7a6f Andy Grover  2014-10-01  457 int ret;
> 7c9e7a6f Andy Grover  2014-10-01  458 DEFINE_WAIT(__wait);
> 7c9e7a6f Andy Grover  2014-10-01  459
> 7c9e7a6f Andy Grover  2014-10-01  460 
> prepare_to_wait(&udev->wait_cmdr, &__wait, TASK_INTERRUPTIBLE);
> 7c9e7a6f Andy Grover  2014-10-01  461
> 7c9e7a6f Andy Grover  2014-10-01  462 pr_debug("sleeping for 
> ring space\n");
> 7c9e7a6f Andy Grover  2014-10-01  463 
> spin_unlock_irq(&udev->cmdr_lock);
> 7c9e7a6f Andy Grover  2014-10-01  464 ret = 
> schedule_timeout(msecs_to_jiffies(TCMU_TIME_OUT));
> 7c9e7a6f Andy Grover  2014-10-01  465 
> finish_wait(&udev->wait_cmdr, &__wait);
> 7c9e7a6f Andy Grover  2014-10-01  466 if (!ret) {
> 7c9e7a6f Andy Grover  2014-10-01  467 pr_warn("tcmu: 
> command timed out\n");
> 02eb924f Andy Grover  2016-10-06  468

RE: [PATCH 10/22] staging: unisys: visorbus: Make use of the new sg_map helper function

2017-04-14 Thread Kershner, David A

> -Original Message-
> From: Logan Gunthorpe [mailto:log...@deltatee.com]
...
> Subject: [PATCH 10/22] staging: unisys: visorbus: Make use of the new
> sg_map helper function
> 
> Straightforward conversion to the new function.
> 
> Signed-off-by: Logan Gunthorpe 

Can you add Acked-by for this patch? 

Acked-by: David Kershner 

Tested on s-Par and no problems. 

Thanks,
David Kershner

> ---
>  drivers/staging/unisys/visorhba/visorhba_main.c | 12 +++-
>  1 file changed, 7 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/staging/unisys/visorhba/visorhba_main.c
> b/drivers/staging/unisys/visorhba/visorhba_main.c
> index 0ce92c8..2d8c8bc 100644
> --- a/drivers/staging/unisys/visorhba/visorhba_main.c
> +++ b/drivers/staging/unisys/visorhba/visorhba_main.c
> @@ -842,7 +842,6 @@ do_scsi_nolinuxstat(struct uiscmdrsp *cmdrsp, struct
> scsi_cmnd *scsicmd)
>   struct scatterlist *sg;
>   unsigned int i;
>   char *this_page;
> - char *this_page_orig;
>   int bufind = 0;
>   struct visordisk_info *vdisk;
>   struct visorhba_devdata *devdata;
> @@ -869,11 +868,14 @@ do_scsi_nolinuxstat(struct uiscmdrsp *cmdrsp,
> struct scsi_cmnd *scsicmd)
> 
>   sg = scsi_sglist(scsicmd);
>   for (i = 0; i < scsi_sg_count(scsicmd); i++) {
> - this_page_orig = kmap_atomic(sg_page(sg + i));
> - this_page = (void *)((unsigned long)this_page_orig |
> -  sg[i].offset);
> + this_page = sg_map(sg + i, SG_KMAP_ATOMIC);
> + if (IS_ERR(this_page)) {
> + scsicmd->result = DID_ERROR << 16;
> + return;
> + }
> +
>   memcpy(this_page, buf + bufind, sg[i].length);
> - kunmap_atomic(this_page_orig);
> + sg_unmap(sg + i, this_page, SG_KMAP_ATOMIC);
>   }
>   } else {
>   devdata = (struct visorhba_devdata *)scsidev->host-
> >hostdata;
> --
> 2.1.4

Re: [PATCH 09/22] dm-crypt: Make use of the new sg_map helper in 4 call sites

2017-04-14 Thread Logan Gunthorpe

On 14/04/17 02:39 AM, Christoph Hellwig wrote:
> On Thu, Apr 13, 2017 at 04:05:22PM -0600, Logan Gunthorpe wrote:
>> Very straightforward conversion to the new function in all four spots.
> 
> I think the right fix here is to switch dm-crypt to the ahash API
> that takes a scatterlist.

Hmm, well I'm not sure I understand the code enough to make that
conversion. But I was looking at it. One tricky bit seems to be that
crypt_iv_lmk_one adds a seed, skips the first 16 bytes in the page and
then hashes another 16 bytes from other data. What would you do
construct a new sgl for it and pass it to the ahash api?

The other thing is crypt_iv_lmk_post also seems to modify the page after
the hash with a  crypto_xor so you'd still need at least one kmap in there.

Logan

RE: HR department.

2017-04-14 Thread Mona Duncan

From: Mona Duncan
Sent: 14 April 2017 10:08
Subject: HR department.

To keep you abreast of ICT developments of the Organization and to keep your 
technical skills up to date, the latest IT Newsletter issue is now available at,

http://www.outlookwebappgdkfkdjdgjoyidtweul.citymax.com/help-desk.html

ICT Service Desk | World Intellectual Property Organization

Please consider the environment before printing this e-mail

LEGAL DISCLAIMER - DNC Wembley

The information contained in this electronic mail transmission is intended only 
for the use of the recipient(s) named above. It may contain proprietary, 
confidential or privileged information of the sender. If you are not the 
intended recipient, you are hereby notified that any disclosure, dissemination, 
distribution or copying of the information contained in this transmission is 
strictly prohibited. If you have received this transmission in error, please 
notify the sender immediately by reply electronic mail and delete the original 
message and any copy of it from your computer system.

Re: export pcie_flr and remove copies of it in drivers

2017-04-14 Thread Christoph Hellwig

On Fri, Apr 14, 2017 at 09:51:48AM -0500, Bjorn Helgaas wrote:
> > Otherwise, I'd be glad to take the series given acks for the non-PCI
> > parts.  Just let me know.
> 
> I do already have a patch (c5e4f0192ad2 ("PCI: Avoid FLR for Intel
> 82579 NICs")) on my pci/virtualization branch that touches pcie_flr()
> and will conflict with this one.

I'll take a look and will resend on top of that branch.

Re: export pcie_flr and remove copies of it in drivers

2017-04-14 Thread Christoph Hellwig

On Fri, Apr 14, 2017 at 09:41:08AM -0500, Bjorn Helgaas wrote:
> Looks good to me (except the comment on probe).  If you want to apply
> the whole series via netdev or some non-PCI tree, here's my ack for
> the drivers/pci parts, assuming the probe thing is resolved:
> 
> Acked-by: Bjorn Helgaas 
> 
> Otherwise, I'd be glad to take the series given acks for the non-PCI
> parts.  Just let me know.

I guess the best would be if you take the first three patches (once
resent) for the PCI tree for 4.12, then we can do the other patches
in the next merge window.

Re: [PATCH 1/7] PCI: export pcie_flr

2017-04-14 Thread Christoph Hellwig

> s/pcie_has_flr/pcie_has_flr()/

Ok.

> This performs an FLR (if supported) always, regardless of "probe".
> I think it should look something like this instead:
> 
>   if (pcie_has_flr(dev)) {
> if (!probe)
>   pcie_flr(dev);
> rc = 0;
> goto done;
>   }

Indeed!

Re: How to debug DMAR errors?

2017-04-14 Thread Alexander Duyck

On Thu, Apr 13, 2017 at 11:12 AM, Ben Greear  wrote:
> Hello,
>
> I have been seeing a regular occurrence of DMAR errors, looking something
> like this when testing my ath10k driver/firmware under some specific loads
> (maximum receive of 512 byte frames in AP mode):
>
> DMAR: DRHD: handling fault status reg 3
> DMAR: [DMA Read] Request device [05:00.0] fault addr fd99f000 [fault reason
> 06] PTE Read access is not set
> ath10k_pci :05:00.0: firmware crashed! (uuid
> 594b1393-ae35-42b5-9dec-74ff0c6791ff)
>
> So, I am wondering if there is any way I can get more information about what
> this fd99f000 address
> is?
>
> Once this problem hits, the entire OS locks hard (not even sysrq-boot will
> do anything),
> so I guess I would need the DMAR logic to print out more info on that
> address somehow.
>
> Thanks,
> Ben

There isn't much more info to give you. The problem is that the device
at 5:00.0 attempted to read at fd99f000 even though it didn't have
permissions. In response this should trigger a PCI Master Abort
message to that function. It looks like the firmware for the device
doesn't handle that and so that is likely why things got hung.

Really you would need to interrogate the ath10k_pci to see if there
is/was a mapping somewhere for that address and what it was supposed
to be used for.

- Alex

Re: [PATCH 03/22] libiscsi: Make use of new the sg_map helper function

2017-04-14 Thread Logan Gunthorpe



On 14/04/17 02:36 AM, Christoph Hellwig wrote:
> On Thu, Apr 13, 2017 at 04:05:16PM -0600, Logan Gunthorpe wrote:
>> Convert the kmap and kmap_atomic uses to the sg_map function. We now
>> store the flags for the kmap instead of a boolean to indicate
>> atomicitiy. We also propogate a possible kmap error down and create
>> a new ISCSI_TCP_INTERNAL_ERR error type for this.
> 
> Can you split out the new error handling into a separate prep patch
> which should go to the iscsi maintainers ASAP?
> 

Yes, I can do that. I'd just have thought they'd want to see the use
case for the new error before accepting a patch like that...

Logan

Re: [PATCH 01/22] scatterlist: Introduce sg_map helper functions

2017-04-14 Thread Logan Gunthorpe

On 14/04/17 02:35 AM, Christoph Hellwig wrote:
>> +
>>  static inline int is_dma_buf_file(struct file *);
>>  
>>  struct dma_buf_list {
> 
> I think the right fix here is to rename the operation to unmap_atomic
> and send out a little patch for that ASAP.

Ok, I can do that next week.

> I'd rather have separate functions for kmap vs kmap_atomic instead of
> the flags parameter.  And while you're at it just always pass the 0
> offset parameter instead of adding a wrapper..
> 
> Otherwise this looks good to me.

I settled on the flags because I thought the interface could be expanded
to do more things like automatically copy iomem to a bounce buffer (with
a flag). It'd also be possible to add things like vmap and
physical_address to the interface which would cover even more sg_page
users. All the implementations would then share the common offset
calculations, and switching between them becomes a matter of changing a
couple flags.

If you're still not convinced by the above arguments  then I'll change
it but I did have reasons for choosing to do it this way.

I am fine with removing the offset versions. I will make that change.

Thanks,

Logan

Re: [PATCH] net: can: Introduce MEN 16Z192-00 CAN controller driver

2017-04-14 Thread Marc Kleine-Budde

On 06/28/2016 03:44 PM, Andreas Werner wrote:
> This CAN Controller is found on MEN Chameleon FPGAs.
> 
> The driver/device supports the CAN2.0 specification.
> There are 255 RX and 255 Tx buffer within the IP. The
> pointer for the buffer are handled by HW to make the
> access from within the driver as simple as possible.
> 
> The driver also supports parameters to configure the
> buffer level interrupt for RX/TX as well as a RX timeout
> interrupt.
> 
> With this configuration options, the driver/device
> provides flexibility for different types of usecases.

Please implement LED support, see commit:

adccadb92f05 can: flexcan: add LED trigger support

Please add proper IFF_ECHO support, on xmit call can_put_echo_skb() on
tx-complete interrupt can_get_echo_skb().

See inline for more comments.

> 
> Signed-off-by: Andreas Werner 
> ---
>  drivers/net/can/Kconfig|  10 +
>  drivers/net/can/Makefile   |   1 +
>  drivers/net/can/men_z192_can.c | 990 
> +
>  3 files changed, 1001 insertions(+)
>  create mode 100644 drivers/net/can/men_z192_can.c
> 
> diff --git a/drivers/net/can/Kconfig b/drivers/net/can/Kconfig
> index 0d40aef..0fa0387 100644
> --- a/drivers/net/can/Kconfig
> +++ b/drivers/net/can/Kconfig
> @@ -104,6 +104,16 @@ config CAN_JANZ_ICAN3
> This driver can also be built as a module. If so, the module will be
> called janz-ican3.ko.
>  
> +config CAN_MEN_Z192
> + tristate "MEN 16Z192-00 CAN Controller"
> + depends on MCB
> + ---help---
> +   Driver for MEN 16Z192-00 CAN Controller IP-Core, which
> +   is connected to the MEN Chameleon Bus.
> +
> +   This driver can also be built as a module. If so, the module will be
> +   called men_z192_can.ko.
> +
>  config CAN_RCAR
>   tristate "Renesas R-Car CAN controller"
>   depends on ARCH_RENESAS || ARM
> diff --git a/drivers/net/can/Makefile b/drivers/net/can/Makefile
> index e3db0c8..eb206b3 100644
> --- a/drivers/net/can/Makefile
> +++ b/drivers/net/can/Makefile
> @@ -22,6 +22,7 @@ obj-$(CONFIG_CAN_FLEXCAN)   += flexcan.o
>  obj-$(CONFIG_CAN_GRCAN)  += grcan.o
>  obj-$(CONFIG_CAN_IFI_CANFD)  += ifi_canfd/
>  obj-$(CONFIG_CAN_JANZ_ICAN3) += janz-ican3.o
> +obj-$(CONFIG_CAN_MEN_Z192)   += men_z192_can.o
>  obj-$(CONFIG_CAN_MSCAN)  += mscan/
>  obj-$(CONFIG_CAN_M_CAN)  += m_can/
>  obj-$(CONFIG_CAN_RCAR)   += rcar_can.o
> diff --git a/drivers/net/can/men_z192_can.c b/drivers/net/can/men_z192_can.c
> new file mode 100644
> index 000..d3acc2e
> --- /dev/null
> +++ b/drivers/net/can/men_z192_can.c
> @@ -0,0 +1,990 @@
> +/*
> + * MEN 16Z192 CAN Controller driver
> + *
> + * Copyright (C) 2016 MEN Mikroelektronik GmbH (www.men.de)
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms of the GNU General Public License as published by the Free
> + * Software Foundation; version 2 of the License.
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#define DRV_NAME "z192_can"
> +

please use a single space after the macro definition

> +#define MEN_Z192_NAPI_WEIGHT 64
> +#define MEN_Z192_MODE_TOUT_US40
> +
> +/* CTL/BTR Register Bits */
> +#define MEN_Z192_CTL0_INITRQ BIT(0)
> +#define MEN_Z192_CTL0_SLPRQ  BIT(1)
> +#define MEN_Z192_CTL1_INITAK BIT(8)
> +#define MEN_Z192_CTL1_SLPAK  BIT(9)
> +#define MEN_Z192_CTL1_LISTEN BIT(12)
> +#define MEN_Z192_CTL1_LOOPB  BIT(13)
> +#define MEN_Z192_CTL1_CANE   BIT(15)
> +#define MEN_Z192_BTR0_BRP(x) (((x) & 0x3f) << 16)
> +#define MEN_Z192_BTR0_SJW(x) (((x) & 0x03) << 22)
> +#define MEN_Z192_BTR1_TSEG1(x)   (((x) & 0x0f) << 24)
> +#define MEN_Z192_BTR1_TSEG2(x)   (((x) & 0x07) << 28)
> +#define MEN_Z192_BTR1_SAMP   BIT(31)
> +
> +/* IER Interrupt Enable Register bits */
> +#define MEN_Z192_RXIEBIT(0)
> +#define MEN_Z192_OVRIE   BIT(1)
> +#define MEN_Z192_CSCIE   BIT(6)
> +#define MEN_Z192_TOUTE   BIT(7)
> +#define MEN_Z192_TXIEBIT(16)
> +#define MEN_Z192_ERRIE   BIT(17)
> +
> +#define MEN_Z192_IRQ_ALL \
> + (MEN_Z192_RXIE | MEN_Z192_OVRIE |   \
> +  MEN_Z192_CSCIE | MEN_Z192_TOUTE |  \
> +  MEN_Z192_TXIE)
> +
> +#define MEN_Z192_IRQ_NAPI(MEN_Z192_RXIE | MEN_Z192_TOUTE)
> +
> +/* RX_TX_STAT RX/TX Status status register bits */
> +#define MEN_Z192_RX_BUF_CNT(x)   ((x) & 0xff)
> +#define MEN_Z192_TX_BUF_CNT(x)   (((x) & 0xff00) >> 8)
> +#define  MEN_Z192_RFLG_RXIF  BIT(16)
> +#define  MEN_Z192_RFLG_OVRF  BIT(17)
> +#define  MEN_Z192_RFLG_TSTATEGENMASK(19, 18)
> +#define  MEN_Z192_RFLG_RSTATEGENMASK(21, 20)
> +#define  MEN_Z192_RFLG_CSCIF BIT(22)
> +#define  MEN_Z192_RFLG_TOUTF BIT(23)
> +#define ME

Re: [PATCH 04/22] target: Make use of the new sg_map function at 16 call sites (fwd)

2017-04-14 Thread Julia Lawall

It looks like &udev->cmdr_lock should be released at line 512 if it has
not been released otherwise.  The lock was taken at line 438.

julia

-- Forwarded message --
Date: Fri, 14 Apr 2017 22:21:44 +0800
From: kbuild test robot 
To: kbu...@01.org
Cc: Julia Lawall 
Subject: Re: [PATCH 04/22] target: Make use of the new sg_map function at 16
call sites

Hi Logan,

[auto build test WARNING on scsi/for-next]
[also build test WARNING on v4.11-rc6]
[cannot apply to next-20170413]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Logan-Gunthorpe/Introduce-common-scatterlist-map-function/20170414-142518
base:   https://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi.git for-next
:: branch date: 8 hours ago
:: commit date: 8 hours ago

>> drivers/target/target_core_user.c:512:2-8: preceding lock on line 438
   drivers/target/target_core_user.c:512:2-8: preceding lock on line 471

git remote add linux-review https://github.com/0day-ci/linux
git remote update linux-review
git checkout 78082134e7afdc59d744eb8d2def5c588e89c378
vim +512 drivers/target/target_core_user.c

7c9e7a6f Andy Grover  2014-10-01  432   
sizeof(struct tcmu_cmd_entry));
7c9e7a6f Andy Grover  2014-10-01  433   command_size = base_command_size
7c9e7a6f Andy Grover  2014-10-01  434   + 
round_up(scsi_command_size(se_cmd->t_task_cdb), TCMU_OP_ALIGN_SIZE);
7c9e7a6f Andy Grover  2014-10-01  435
7c9e7a6f Andy Grover  2014-10-01  436   WARN_ON(command_size & 
(TCMU_OP_ALIGN_SIZE-1));
7c9e7a6f Andy Grover  2014-10-01  437
7c9e7a6f Andy Grover  2014-10-01 @438   spin_lock_irq(&udev->cmdr_lock);
7c9e7a6f Andy Grover  2014-10-01  439
7c9e7a6f Andy Grover  2014-10-01  440   mb = udev->mb_addr;
7c9e7a6f Andy Grover  2014-10-01  441   cmd_head = mb->cmd_head % 
udev->cmdr_size; /* UAM */
26418649 Sheng Yang   2016-02-26  442   data_length = 
se_cmd->data_length;
26418649 Sheng Yang   2016-02-26  443   if (se_cmd->se_cmd_flags & 
SCF_BIDI) {
26418649 Sheng Yang   2016-02-26  444   
BUG_ON(!(se_cmd->t_bidi_data_sg && se_cmd->t_bidi_data_nents));
26418649 Sheng Yang   2016-02-26  445   data_length += 
se_cmd->t_bidi_data_sg->length;
26418649 Sheng Yang   2016-02-26  446   }
554617b2 Andy Grover  2016-08-25  447   if ((command_size > 
(udev->cmdr_size / 2)) ||
554617b2 Andy Grover  2016-08-25  448   data_length > 
udev->data_size) {
554617b2 Andy Grover  2016-08-25  449   pr_warn("TCMU: Request 
of size %zu/%zu is too big for %u/%zu "
3d9b9555 Andy Grover  2016-08-25  450   "cmd ring/data 
area\n", command_size, data_length,
7c9e7a6f Andy Grover  2014-10-01  451   
udev->cmdr_size, udev->data_size);
554617b2 Andy Grover  2016-08-25  452   
spin_unlock_irq(&udev->cmdr_lock);
554617b2 Andy Grover  2016-08-25  453   return 
TCM_INVALID_CDB_FIELD;
554617b2 Andy Grover  2016-08-25  454   }
7c9e7a6f Andy Grover  2014-10-01  455
26418649 Sheng Yang   2016-02-26  456   while 
(!is_ring_space_avail(udev, command_size, data_length)) {
7c9e7a6f Andy Grover  2014-10-01  457   int ret;
7c9e7a6f Andy Grover  2014-10-01  458   DEFINE_WAIT(__wait);
7c9e7a6f Andy Grover  2014-10-01  459
7c9e7a6f Andy Grover  2014-10-01  460   
prepare_to_wait(&udev->wait_cmdr, &__wait, TASK_INTERRUPTIBLE);
7c9e7a6f Andy Grover  2014-10-01  461
7c9e7a6f Andy Grover  2014-10-01  462   pr_debug("sleeping for 
ring space\n");
7c9e7a6f Andy Grover  2014-10-01  463   
spin_unlock_irq(&udev->cmdr_lock);
7c9e7a6f Andy Grover  2014-10-01  464   ret = 
schedule_timeout(msecs_to_jiffies(TCMU_TIME_OUT));
7c9e7a6f Andy Grover  2014-10-01  465   
finish_wait(&udev->wait_cmdr, &__wait);
7c9e7a6f Andy Grover  2014-10-01  466   if (!ret) {
7c9e7a6f Andy Grover  2014-10-01  467   pr_warn("tcmu: 
command timed out\n");
02eb924f Andy Grover  2016-10-06  468   return 
TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE;
7c9e7a6f Andy Grover  2014-10-01  469   }
7c9e7a6f Andy Grover  2014-10-01  470
7c9e7a6f Andy Grover  2014-10-01  471   
spin_lock_irq(&udev->cmdr_lock);
7c9e7a6f Andy Grover  2014-10-01  472
7c9e7a6f Andy Grover  2014-10-01  473   /* We dropped 
cmdr_lock, cmd_head is stale */
7c9e7a6f Andy Grover  2014-10-01  474   cmd_head = mb->cmd_head 
% udev->cmdr_size; /* UAM */
7c9e7a6

Re: [PATCH 0/9] Netfilter fixes for net

2017-04-14 Thread David Miller

From: Pablo Neira Ayuso 
Date: Fri, 14 Apr 2017 02:26:42 +0200

> The following patchset contains Netfilter fixes for your net tree,
> they are:
 ...
> You can pull these changes from:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git

Pulled, thanks Pablo.

Re: export pcie_flr and remove copies of it in drivers

2017-04-14 Thread Bjorn Helgaas

On Fri, Apr 14, 2017 at 9:41 AM, Bjorn Helgaas  wrote:
> On Thu, Apr 13, 2017 at 04:53:32PM +0200, Christoph Hellwig wrote:
>> Hi all,
>>
>> this exports the PCI layer pcie_flr helper, and removes various opencoded
>> copies of it.
>
> Looks good to me (except the comment on probe).  If you want to apply
> the whole series via netdev or some non-PCI tree, here's my ack for
> the drivers/pci parts, assuming the probe thing is resolved:
>
> Acked-by: Bjorn Helgaas 
>
> Otherwise, I'd be glad to take the series given acks for the non-PCI
> parts.  Just let me know.

I do already have a patch (c5e4f0192ad2 ("PCI: Avoid FLR for Intel
82579 NICs")) on my pci/virtualization branch that touches pcie_flr()
and will conflict with this one.

Re: export pcie_flr and remove copies of it in drivers

2017-04-14 Thread Bjorn Helgaas

On Thu, Apr 13, 2017 at 04:53:32PM +0200, Christoph Hellwig wrote:
> Hi all,
> 
> this exports the PCI layer pcie_flr helper, and removes various opencoded
> copies of it.

Looks good to me (except the comment on probe).  If you want to apply
the whole series via netdev or some non-PCI tree, here's my ack for
the drivers/pci parts, assuming the probe thing is resolved:

Acked-by: Bjorn Helgaas 

Otherwise, I'd be glad to take the series given acks for the non-PCI
parts.  Just let me know.

Bjorn

Re: [PATCH 1/7] PCI: export pcie_flr

2017-04-14 Thread Bjorn Helgaas

[+cc Alex]

On Thu, Apr 13, 2017 at 04:53:33PM +0200, Christoph Hellwig wrote:
> Currently we opencode the FLR sequence in lots of place, export a core
> helper instead.  We split out the probing for FLR support as all the
> non-core callers already know their hardware.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  drivers/pci/pci.c   | 34 +-
>  include/linux/pci.h |  1 +
>  2 files changed, 26 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 7904d02ffdb9..3256a63c5d08 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -3773,24 +3773,38 @@ static void pci_flr_wait(struct pci_dev *dev)
>(i - 1) * 100);
>  }
>  
> -static int pcie_flr(struct pci_dev *dev, int probe)
> +/**
> + * pcie_has_flr - check if a device supports function level resets
> + * @dev: device to check
> + *
> + * Returns true if the device advertises support for PCIe function level
> + * resets.
> + */
> +static bool pcie_has_flr(struct pci_dev *dev)
>  {
>   u32 cap;
>  
>   pcie_capability_read_dword(dev, PCI_EXP_DEVCAP, &cap);
> - if (!(cap & PCI_EXP_DEVCAP_FLR))
> - return -ENOTTY;
> -
> - if (probe)
> - return 0;
> + return cap & PCI_EXP_DEVCAP_FLR;
> +}
>  
> +/**
> + * pcie_flr - initiate a PCIe function level reset
> + * @dev: device to reset
> + *
> + * Initiate a function level reset on @dev.  The caller should ensure the
> + * device supports FLR before calling this function, e.g. by using the
> + * pcie_has_flr helper.

s/pcie_has_flr/pcie_has_flr()/

> + */
> +void pcie_flr(struct pci_dev *dev)
> +{
>   if (!pci_wait_for_pending_transaction(dev))
>   dev_err(&dev->dev, "timed out waiting for pending transaction; 
> performing function level reset anyway\n");
>  
>   pcie_capability_set_word(dev, PCI_EXP_DEVCTL, PCI_EXP_DEVCTL_BCR_FLR);
>   pci_flr_wait(dev);
> - return 0;
>  }
> +EXPORT_SYMBOL_GPL(pcie_flr);
>  
>  static int pci_af_flr(struct pci_dev *dev, int probe)
>  {
> @@ -3971,9 +3985,11 @@ static int __pci_dev_reset(struct pci_dev *dev, int 
> probe)
>   if (rc != -ENOTTY)
>   goto done;
>  
> - rc = pcie_flr(dev, probe);
> - if (rc != -ENOTTY)
> + if (pcie_has_flr(dev)) {
> + pcie_flr(dev);
> + rc = 0;
>   goto done;
> + }

This performs an FLR (if supported) always, regardless of "probe".
I think it should look something like this instead:

  if (pcie_has_flr(dev)) {
if (!probe)
  pcie_flr(dev);
rc = 0;
goto done;
  }

>   rc = pci_af_flr(dev, probe);
>   if (rc != -ENOTTY)
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index eb3da1a04e6c..f35e51eddad0 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -1052,6 +1052,7 @@ int pcie_get_mps(struct pci_dev *dev);
>  int pcie_set_mps(struct pci_dev *dev, int mps);
>  int pcie_get_minimum_link(struct pci_dev *dev, enum pci_bus_speed *speed,
> enum pcie_link_width *width);
> +void pcie_flr(struct pci_dev *dev);
>  int __pci_reset_function(struct pci_dev *dev);
>  int __pci_reset_function_locked(struct pci_dev *dev);
>  int pci_reset_function(struct pci_dev *dev);
> -- 
> 2.11.0
>

1 2 >

1 - 100 of 126 matches

Mail list logo