AF_PACKET V4/AF_XDP userspace API questions

2018-01-30 Thread Ilias Apalodimas
We've noticed 3 different hardware approaches in receiving payloads

1. Host driver needs to pre-load descriptor ring with addresses of RAM
buffers to write arriving data.
The "standard" functionality for most NICs is (in little detail) fetch
the descriptor, write the payload to host RAM and update the
descriptor accordingly.
So for these NICs, buffer addresses are provided in RX descriptors (RX
descriptors are two-way communication entity).
This translates to "1 ring + 1 buffer array" model, or the packet
array model in short.

2. There's a category of NICs (Chelsio and Netcope are the ones we are
aware of) that split that into two one-way entities:
One to communicate buffer addresses from host to NIC and one to
communicate packets/payloads from NIC to host.
So the driver provides a set of unstructured, contiguous memory areas
to the NIC, the NIC decides where to place the packets in memory and
updates the descriptors accordingly (the descriptor ring is not
pre-loaded with any data and the NIC is free to write the packet
anywhere in the provided contiguous memory).
This is a "1 ring + 1 set of areas" model, or the tape model in short.

3. The last hardware approach we are aware of is NICs that you provide
multiple array buffers (128, 256, 1500, 9000 etc).
The NIC then decides in which array slot to place the packet depending
on it's size.
This is "1 ring + X buffer arrays" model or the multi packet array in short.

Is memory schemes 2 and 3 supported? If not do you plan on supporting them?

Regards


Re: AF_PACKET V4/AF_XDP userspace API questions

2018-01-30 Thread Ilias Apalodimas
Really sorry for the noise, mail is in lkml properly now.
I failed at marking it as plain text.

You can disregard this one.

Regards
Ilias

On 30 January 2018 at 10:02, Ilias Apalodimas
 wrote:
> We've noticed 3 different hardware approaches in receiving payloads
>
> 1. Host driver needs to pre-load descriptor ring with addresses of RAM
> buffers to write arriving data.
> The "standard" functionality for most NICs is (in little detail) fetch the
> descriptor, write the payload to host RAM and update the descriptor
> accordingly.
> So for these NICs, buffer addresses are provided in RX descriptors (RX
> descriptors are two-way communication entity).
> This translates to "1 ring + 1 buffer array" model, or the packet array
> model in short.
>
> 2. There's a category of NICs (Chelsio and Netcope are the ones we are aware
> of) that split that into two one-way entities:
> One to communicate buffer addresses from host to NIC and one to communicate
> packets/payloads from NIC to host.
> So the driver provides a set of unstructured, contiguous memory areas to the
> NIC, the NIC decides where to place the packets in memory and updates the
> descriptors accordingly (the descriptor ring is not pre-loaded with any data
> and the NIC is free to write the packet anywhere in the provided contiguous
> memory).
> This is a "1 ring + 1 set of areas" model, or the tape model in short.
>
> 3. The last hardware approach we are aware of is NICs that you provide
> multiple array buffers (128, 256, 1500, 9000 etc).
> The NIC then decides in which array slot to place the packet depending on
> it's size.
> This is "1 ring + X buffer arrays" model or the multi packet array in short.
>
> Is memory schemes 2 and 3 supported? If not do you plan on supporting them?
>
> Regards
> Ilias


Re: [netfilter-core] kernel panic: Out of memory and no killable processes... (2)

2018-01-30 Thread Florian Westphal
Michal Hocko  wrote:
> On Mon 29-01-18 23:35:22, Florian Westphal wrote:
> > Kirill A. Shutemov  wrote:
> [...]
> > > I hate what I'm saying, but I guess we need some tunable here.
> > > Not sure what exactly.
> > 
> > Would memcg help?
> 
> That really depends. I would have to check whether vmalloc path obeys
> __GFP_ACCOUNT (I suspect it does except for page tables allocations but
> that shouldn't be a big deal). But then the other potential problem is
> the life time of the xt_table_info (or other potentially large) data
> structures. Are they bound to any process life time.

No.

> Because if they are
> not then the OOM killer will not help. The OOM panic earlier in this
> thread suggests it doesn't because the test case managed to eat all the
> available memory and killed all the eligible tasks which didn't help.

Yes, which is why we do not want any OOM killer invocation in first
place...

> So in some sense the memcg would help to stop the excessive allocation,
> but it wouldn't resolve it other than kill all tasks in the affected
> memcg/container. Whether this is sufficient or not, I dunno. It sounds
> quite suboptimal to me. But it is true this would be less tricky then
> adding a global knob...

Global knob doesn't really help at all, I can add multiple large
iptables rulesets (so we would have to account), and we have same issue
in virtually all of networking, so we need limits for interface count,
tunnel count, ipsec policies/SAs, nftables, tc, etc etc.


Re: [PATCH v3 1/2] net: create skb_gso_validate_mac_len()

2018-01-30 Thread Marcelo Ricardo Leitner
Hi,

On Tue, Jan 30, 2018 at 12:14:46PM +1100, Daniel Axtens wrote:
> If you take a GSO skb, and split it into packets, will the MAC
> length (L2 + L3 + L4 headers + payload) of those packets be small
> enough to fit within a given length?
> 
> Move skb_gso_mac_seglen() to skbuff.h with other related functions
> like skb_gso_network_seglen() so we can use it, and then create
> skb_gso_validate_mac_len to do the full calculation.
> 
> Signed-off-by: Daniel Axtens 
> ---
>  include/linux/skbuff.h | 16 +
>  net/core/skbuff.c  | 65 
> +++---
>  net/sched/sch_tbf.c| 10 
>  3 files changed, 67 insertions(+), 24 deletions(-)
> 
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index b8e0da6c27d6..242d6773c7c2 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -3287,6 +3287,7 @@ int skb_shift(struct sk_buff *tgt, struct sk_buff *skb, 
> int shiftlen);
>  void skb_scrub_packet(struct sk_buff *skb, bool xnet);
>  unsigned int skb_gso_transport_seglen(const struct sk_buff *skb);
>  bool skb_gso_validate_mtu(const struct sk_buff *skb, unsigned int mtu);
> +bool skb_gso_validate_mac_len(const struct sk_buff *skb, unsigned int len);
>  struct sk_buff *skb_segment(struct sk_buff *skb, netdev_features_t features);
>  struct sk_buff *skb_vlan_untag(struct sk_buff *skb);
>  int skb_ensure_writable(struct sk_buff *skb, int write_len);
> @@ -4120,6 +4121,21 @@ static inline unsigned int 
> skb_gso_network_seglen(const struct sk_buff *skb)
>   return hdr_len + skb_gso_transport_seglen(skb);
>  }
>  
> +/**
> + * skb_gso_mac_seglen - Return length of individual segments of a gso packet
> + *
> + * @skb: GSO skb
> + *
> + * skb_gso_mac_seglen is used to determine the real size of the
> + * individual segments, including MAC/L2, Layer3 (IP, IPv6) and L4
> + * headers (TCP/UDP).
> + */
> +static inline unsigned int skb_gso_mac_seglen(const struct sk_buff *skb)
> +{
> + unsigned int hdr_len = skb_transport_header(skb) - skb_mac_header(skb);
> + return hdr_len + skb_gso_transport_seglen(skb);
> +}
> +
>  /* Local Checksum Offload.
>   * Compute outer checksum based on the assumption that the
>   * inner checksum will be offloaded later.
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 01e8285aea73..55d84ab7d093 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -4914,36 +4914,73 @@ unsigned int skb_gso_transport_seglen(const struct 
> sk_buff *skb)
>  EXPORT_SYMBOL_GPL(skb_gso_transport_seglen);
>  
>  /**
> - * skb_gso_validate_mtu - Return in case such skb fits a given MTU
> + * skb_gso_size_check - check the skb size, considering GSO_BY_FRAGS
>   *
> - * @skb: GSO skb
> - * @mtu: MTU to validate against
> + * There are a couple of instances where we have a GSO skb, and we
> + * want to determine what size it would be after it is segmented.
>   *
> - * skb_gso_validate_mtu validates if a given skb will fit a wanted MTU
> - * once split.
> + * We might want to check:
> + * -L3+L4+payload size (e.g. IP forwarding)
> + * - L2+L3+L4+payload size (e.g. sanity check before passing to driver)
> + *
> + * This is a helper to do that correctly considering GSO_BY_FRAGS.
> + *
> + * @seg_len: The segmented length (from skb_gso_*_seglen). In the
> + *   GSO_BY_FRAGS case this will be [header sizes + GSO_BY_FRAGS].
> + *
> + * @max_len: The maximum permissible length.
> + *
> + * Returns true if the segmented length <= max length.
>   */
> -bool skb_gso_validate_mtu(const struct sk_buff *skb, unsigned int mtu)
> -{
> +static inline bool skb_gso_size_check(const struct sk_buff *skb,
> +   unsigned int seg_len,
> +   unsigned int max_len) {
>   const struct skb_shared_info *shinfo = skb_shinfo(skb);
>   const struct sk_buff *iter;
> - unsigned int hlen;
> -
> - hlen = skb_gso_network_seglen(skb);
>  
>   if (shinfo->gso_size != GSO_BY_FRAGS)
> - return hlen <= mtu;
> + return seg_len <= max_len;
>  
>   /* Undo this so we can re-use header sizes */
> - hlen -= GSO_BY_FRAGS;
> + seg_len -= GSO_BY_FRAGS;
>  
>   skb_walk_frags(skb, iter) {
> - if (hlen + skb_headlen(iter) > mtu)
> + if (seg_len + skb_headlen(iter) > max_len)
>   return false;
>   }
>  
>   return true;
>  }
> -EXPORT_SYMBOL_GPL(skb_gso_validate_mtu);
> +
> +/**
> + * skb_gso_validate_mtu - Return in case such skb fits a given MTU
> + *
> + * @skb: GSO skb
> + * @mtu: MTU to validate against
> + *
> + * skb_gso_validate_mtu validates if a given skb will fit a wanted MTU
> + * once split.
> + */
> +bool skb_gso_validate_mtu(const struct sk_buff *skb, unsigned int mtu)
> +{
> + return skb_gso_size_check(skb, skb_gso_network_seglen(skb), mtu);
> +}
> +EXPORT_SYMBOL_GPL(skb_gso_validate_network_len);

This export is not matching the function name.



Re: [PATCH v3 0/2] bnx2x: disable GSO on too-large packets

2018-01-30 Thread Marcelo Ricardo Leitner
On Tue, Jan 30, 2018 at 12:14:45PM +1100, Daniel Axtens wrote:
> We observed a case where a packet received on an ibmveth device had a
> GSO size of around 10kB. This was forwarded by Open vSwitch to a bnx2x
> device, where it caused a firmware assert. This is described in detail
> at [0].
> 
> Ultimately we want a fix in the core, but that is very tricky to
> backport. So for now, just stop the bnx2x driver from crashing.
> 
> When net-next re-opens I will send the fix to the core and a revert
> for this.
> 
> Marcelo: I have left renaming skb_gso_validate_mtu() for the
> next series.

Alright! Just need to sync the EXPORT_ in there.
(I have no further comments, LGTM)

> 
> Thanks,
> Daniel
>
> [0]: https://patchwork.ozlabs.org/patch/859410/
> 
> Cc: manish.cho...@cavium.com
> Cc: Jason Wang 
> Cc: Pravin Shelar 
> Cc: Marcelo Ricardo Leitner 
> 
> Daniel Axtens (2):
>   net: create skb_gso_validate_mac_len()
>   bnx2x: disable GSO where gso_size is too big for hardware
> 
>  drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c |  9 
>  include/linux/skbuff.h   | 16 ++
>  net/core/skbuff.c| 65 
> +++-
>  net/sched/sch_tbf.c  | 10 
>  4 files changed, 76 insertions(+), 24 deletions(-)
> 
> -- 
> 2.14.1
> 


Re: [netfilter-core] kernel panic: Out of memory and no killable processes... (2)

2018-01-30 Thread Kirill A. Shutemov
On Tue, Jan 30, 2018 at 09:11:27AM +0100, Florian Westphal wrote:
> Michal Hocko  wrote:
> > On Mon 29-01-18 23:35:22, Florian Westphal wrote:
> > > Kirill A. Shutemov  wrote:
> > [...]
> > > > I hate what I'm saying, but I guess we need some tunable here.
> > > > Not sure what exactly.
> > > 
> > > Would memcg help?
> > 
> > That really depends. I would have to check whether vmalloc path obeys
> > __GFP_ACCOUNT (I suspect it does except for page tables allocations but
> > that shouldn't be a big deal). But then the other potential problem is
> > the life time of the xt_table_info (or other potentially large) data
> > structures. Are they bound to any process life time.
> 
> No.

Well, IIUC they bound to net namespace life time, so killing all
proccesses in the namespace would help to get memory back. :)

-- 
 Kirill A. Shutemov


Re: [PATCH net-next,v2 2/2] net: sched: add em_ipt ematch for calling xtables matches

2018-01-30 Thread Eyal Birger
On Sun, 28 Jan 2018 19:22:12 -0800
Cong Wang  wrote:

> On Fri, Jan 26, 2018 at 11:57 AM, Eyal Birger 
> wrote:
> > On Fri, Jan 26, 2018 at 8:50 PM, Pablo Neira Ayuso
> >  wrote:  
> >> Isn't there a way to reject the use of this from ->change()? ie.
> >> from control plane configuration.  
> >
> > I wasn't able to find a simple way of doing so:
> >
> > - AFAIU tc filters are detached from the qdiscs they operate on via
> > tcf_block instances
> >   that may be shared by different qdiscs. I was not able to be sure
> > that filters attached to ingress qdiscs via tcf_blocks at
> > configuration time cannot be later be shared
> >   with non ingress qdiscs. Nor was I able to find another classifier
> > making the ingress/egress
> >   distinction at configuration time.
> >
> > - ematches are not provided with 'ingress/egress' information at
> > 'change()' invocation, though
> >   of course the infrastructure could be extended to provide this,
> > given the distinction is available.
> >  
> 
> In the past you can check tp->q, but now we support shared tc filter
> block, so it is hard. I think your v1 is okay, which just silently
> passes the match on egress side. Or maybe we can just add a pr_info()
> unconditionally in em_ipt_change() saying only ingress is supported.

Thanks!

The motivation for allowing only ingress was to avoid skb modifications
on egress as when running the match on egress, skb->data must point to
the L3 header. Looking again at the calling flow e.g. from __dev_queue_xmit(),
I don't see a case where skb may be shared.

Similarly on ingress flow, sch_handle_ingress() modifies the skb, and
tc actions perform skb modification without share checking.

So as far as I can tell skb_pull() on the match is safe.
Is there a different code path I should be looking for?

If that is the case, perhaps the v1 approach supporting both directions
including skb_pull() can be resubmitted without the pr_notice once
net-next is open.

Eyal.






Re: [netfilter-core] kernel panic: Out of memory and no killable processes... (2)

2018-01-30 Thread Dmitry Vyukov
On Tue, Jan 30, 2018 at 9:28 AM, Kirill A. Shutemov
 wrote:
> On Tue, Jan 30, 2018 at 09:11:27AM +0100, Florian Westphal wrote:
>> Michal Hocko  wrote:
>> > On Mon 29-01-18 23:35:22, Florian Westphal wrote:
>> > > Kirill A. Shutemov  wrote:
>> > [...]
>> > > > I hate what I'm saying, but I guess we need some tunable here.
>> > > > Not sure what exactly.
>> > >
>> > > Would memcg help?
>> >
>> > That really depends. I would have to check whether vmalloc path obeys
>> > __GFP_ACCOUNT (I suspect it does except for page tables allocations but
>> > that shouldn't be a big deal). But then the other potential problem is
>> > the life time of the xt_table_info (or other potentially large) data
>> > structures. Are they bound to any process life time.
>>
>> No.
>
> Well, IIUC they bound to net namespace life time, so killing all
> proccesses in the namespace would help to get memory back. :)

... unless the namespace is mounted into file system.

Let's start with NOWARN as that's what kernel generally uses for
allocations with user-controllable size. ENOMEM is roughly as
informative as the WARNING message in this case.

I think we also need to consider setting up memory cgroup for
syzkaller test processes (we do RLIMIT_AS, but that's weak).


Re: possible deadlock in xt_find_table_lock

2018-01-30 Thread Florian Westphal
#syz dup: possible deadlock in do_ip_getsockopt


Re: possible deadlock in xt_find_revision

2018-01-30 Thread Florian Westphal
#syz dup: possible deadlock in do_ip_getsockopt


Backport Mellanox mlx5 patches to stable 4.9.y

2018-01-30 Thread Marta Rybczynska
Hello Mellanox maintainers,
I'd like to ask you to OK backporting two patches in mlx5 driver to 4.9 stable
tree (they're in master for some time already).

We have multiple deployment in 4.9 that are running into the bug fixed by those
patches. We're deploying patched kernels and the issue disappears.

The patches are:
1410a90ae449061b7e1ae19d275148f36948801b net/mlx5: Define interface bits for 
fencing UMR wqe
6e8484c5cf07c7ee632587e98c1a12d319dacb7c RDMA/mlx5: set UMR wqe fence according 
to HCA cap
 
Regards,
Marta


[PATCH] impr: Fix ptrdiff_t print formatting

2018-01-30 Thread James Hogan
ipmr_vif_seq_show() prints the difference between two pointers with the
format string %2zd (z for size_t), however the correct format string is
%2td instead (t for ptrdiff_t).

The same bug in ip6mr_vif_seq_show() was already fixed long ago by
commit d430a227d272 ("bogus format in ip6mr").

Signed-off-by: James Hogan 
Cc: Alexey Kuznetsov 
Cc: "David S. Miller" 
Cc: Hideaki YOSHIFUJI 
Cc: netdev@vger.kernel.org
---
 net/ipv4/ipmr.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
index fd5f19c988e4..0a279d99a532 100644
--- a/net/ipv4/ipmr.c
+++ b/net/ipv4/ipmr.c
@@ -3022,7 +3022,7 @@ static int ipmr_vif_seq_show(struct seq_file *seq, void 
*v)
const char *name =  vif->dev ? vif->dev->name : "none";
 
seq_printf(seq,
-  "%2zd %-10s %8ld %7ld  %8ld %7ld %05X %08X %08X\n",
+  "%2td %-10s %8ld %7ld  %8ld %7ld %05X %08X %08X\n",
   vif - mrt->vif_table,
   name, vif->bytes_in, vif->pkt_in,
   vif->bytes_out, vif->pkt_out,
-- 
2.13.6



Re: [netfilter-core] kernel panic: Out of memory and no killable processes... (2)

2018-01-30 Thread Michal Hocko
On Tue 30-01-18 09:11:27, Florian Westphal wrote:
> Michal Hocko  wrote:
> > On Mon 29-01-18 23:35:22, Florian Westphal wrote:
> > > Kirill A. Shutemov  wrote:
> > [...]
> > > > I hate what I'm saying, but I guess we need some tunable here.
> > > > Not sure what exactly.
> > > 
> > > Would memcg help?
> > 
> > That really depends. I would have to check whether vmalloc path obeys
> > __GFP_ACCOUNT (I suspect it does except for page tables allocations but
> > that shouldn't be a big deal). But then the other potential problem is
> > the life time of the xt_table_info (or other potentially large) data
> > structures. Are they bound to any process life time.
> 
> No.
> 
> > Because if they are
> > not then the OOM killer will not help. The OOM panic earlier in this
> > thread suggests it doesn't because the test case managed to eat all the
> > available memory and killed all the eligible tasks which didn't help.
> 
> Yes, which is why we do not want any OOM killer invocation in first
> place...

The problem is that as soon as you eat that memory and ask for more
until you fail with ENOMEM then the OOM is simply unavoidable.
-- 
Michal Hocko
SUSE Labs


Re: [netfilter-core] kernel panic: Out of memory and no killable processes... (2)

2018-01-30 Thread Michal Hocko
On Tue 30-01-18 10:02:34, Dmitry Vyukov wrote:
> On Tue, Jan 30, 2018 at 9:28 AM, Kirill A. Shutemov
>  wrote:
> > On Tue, Jan 30, 2018 at 09:11:27AM +0100, Florian Westphal wrote:
> >> Michal Hocko  wrote:
> >> > On Mon 29-01-18 23:35:22, Florian Westphal wrote:
> >> > > Kirill A. Shutemov  wrote:
> >> > [...]
> >> > > > I hate what I'm saying, but I guess we need some tunable here.
> >> > > > Not sure what exactly.
> >> > >
> >> > > Would memcg help?
> >> >
> >> > That really depends. I would have to check whether vmalloc path obeys
> >> > __GFP_ACCOUNT (I suspect it does except for page tables allocations but
> >> > that shouldn't be a big deal). But then the other potential problem is
> >> > the life time of the xt_table_info (or other potentially large) data
> >> > structures. Are they bound to any process life time.
> >>
> >> No.
> >
> > Well, IIUC they bound to net namespace life time, so killing all
> > proccesses in the namespace would help to get memory back. :)
> 
> ... unless the namespace is mounted into file system.
> 
> Let's start with NOWARN as that's what kernel generally uses for
> allocations with user-controllable size. ENOMEM is roughly as
> informative as the WARNING message in this case.

You want __GFP_NORETRY but that is not _fully_ supported by kvmalloc
right now. More specifically kvmalloc doesn't guanratee that the request
will not trigger the OOM killer (like regular __GFP_NORETRY). This is
because of internal vmalloc restrictions. If you are however OK to
simply bail out in most cases then __GFP_NORETRY should work reasonably
fine.

> I think we also need to consider setting up memory cgroup for
> syzkaller test processes (we do RLIMIT_AS, but that's weak).

Well, this is not about syzkaller, it merely pointed out a potential
DoS... And that has to be addressed somehow.

-- 
Michal Hocko
SUSE Labs


Re: [PATCH v6 07/36] nds32: Exception handling

2018-01-30 Thread Vincent Chen
2018-01-24 19:10 GMT+08:00 Arnd Bergmann :
> On Wed, Jan 24, 2018 at 12:09 PM, Arnd Bergmann  wrote:
>> On Wed, Jan 24, 2018 at 11:53 AM, Vincent Chen  wrote:
>>> 2018-01-18 18:14 GMT+08:00 Arnd Bergmann :
>
>> Ok. I still wonder about the kernel part of this though: is it a good idea
>> for user space to configure whether the kernel does unaligned
>> accesses? I would think that the kernel should just be fixed in such
>> a case.
>
> To clarify: I'm asking only about unaligned accesses from kernel code itself,
> which is generally considered a bug when
> CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS is disabled.
>
>   Arnd

Thanks for your comments.

For performance, we decide always disable
CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS even if hardware supports
unaligned accessing. Therefore, I will remove kernel unaligned accessing from
nds32/mm/alignment.c. In other words, alignment.c only addresses unaligned
accessing for user space.

Vincent


Re: [PATCH bpf-next v7 3/5] libbpf: add error reporting in XDP

2018-01-30 Thread Daniel Borkmann
Hi Eric,

On 01/27/2018 11:32 AM, Eric Leblond wrote:
> On Sat, 2018-01-27 at 02:28 +0100, Daniel Borkmann wrote:
>> On 01/25/2018 01:05 AM, Eric Leblond wrote:
>>> Parse netlink ext attribute to get the error message returned by
>>> the card. Code is partially take from libnl.
>>>
>>> We add netlink.h to the uapi include of tools. And we need to
>>> avoid include of userspace netlink header to have a successful
>>> build of sample so nlattr.h has a define to avoid
>>> the inclusion. Using a direct define could have been an issue
>>> as NLMSGERR_ATTR_MAX can change in the future.
>>>
>>> We also define SOL_NETLINK if not defined to avoid to have to
>>> copy socket.h for a fixed value.
>>>
>>> Signed-off-by: Eric Leblond 
>>> Acked-by: Alexei Starovoitov 
>>>
>>> remote rtne
>>>
>>> Signed-off-by: Eric Leblond 
>>
>> Some leftover artifact from squashing commits?
> 
> Outch
> 
>>>  samples/bpf/Makefile   |   2 +-
>>>  tools/lib/bpf/Build|   2 +-
>>>  tools/lib/bpf/bpf.c|  13 +++-
>>>  tools/lib/bpf/nlattr.c | 187
>>> +
>>>  tools/lib/bpf/nlattr.h |  72 +++
>>>  5 files changed, 273 insertions(+), 3 deletions(-)
>>>  create mode 100644 tools/lib/bpf/nlattr.c
>>>  create mode 100644 tools/lib/bpf/nlattr.h
>>>
>>> diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
>>> index 7f61a3d57fa7..5c4cd3745282 100644
>>> --- a/samples/bpf/Makefile
>>> +++ b/samples/bpf/Makefile
>>> @@ -45,7 +45,7 @@ hostprogs-y += xdp_rxq_info
>>>  hostprogs-y += syscall_tp
>>>  
>>>  # Libbpf dependencies
>>> -LIBBPF := ../../tools/lib/bpf/bpf.o
>>> +LIBBPF := ../../tools/lib/bpf/bpf.o ../../tools/lib/bpf/nlattr.o
>>>  CGROUP_HELPERS :=
>>> ../../tools/testing/selftests/bpf/cgroup_helpers.o
>>>  
>>>  test_lru_dist-objs := test_lru_dist.o $(LIBBPF)
>>> diff --git a/tools/lib/bpf/Build b/tools/lib/bpf/Build
>>> index d8749756352d..64c679d67109 100644
>>> --- a/tools/lib/bpf/Build
>>> +++ b/tools/lib/bpf/Build
>>> @@ -1 +1 @@
>>> -libbpf-y := libbpf.o bpf.o
>>> +libbpf-y := libbpf.o bpf.o nlattr.o
>>> diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
>>> index 749a447ec9ed..765fd95b0657 100644
>>> --- a/tools/lib/bpf/bpf.c
>>> +++ b/tools/lib/bpf/bpf.c
>>> @@ -27,7 +27,7 @@
>>>  #include "bpf.h"
>>>  #include "libbpf.h"
>>>  #include "nlattr.h"
>>> -#include 
>>> +#include 
>>
>> Okay, so here it's put back from prior added uapi/linux/rtnetlink.h
>> into linux/rtnetlink.h. Could you add this properly in the first
>> commit rather than relative adjustment/fix within the same set?
> 
> Yes, sure.
> 
>>>  #include 
>>>  #include 
>>>  
>>> @@ -37,6 +37,10 @@
>>>  #define IFLA_XDP_FLAGS 3
>>>  #endif
>>>  
>>> +#ifndef SOL_NETLINK
>>> +#define SOL_NETLINK 270
>>> +#endif
>>
>> This would need include/linux/socket.h into tools/ include infra
>> as well, no?
> 
> Yes, and I fear a lot of dependencies.

Sorry for the delay! So, once you pull these two headers in, are there more
follow-up dependencies with other headers required on your old test system?
I'd also be fine with keeping SOL_NETLINK here as is, but I would try to
have the if_link.h in the tools/ include infra, so we have all XDP definitions
available and up to date from the lib's pov. The dependency for if_link.h
seems minimal and probably enough to to only get in this single header. Could
you give that a try?

Thanks a lot,
Daniel


Re: r8169 take too long to complete driver initialization

2018-01-30 Thread Chris Chiu
On Mon, Jan 29, 2018 at 11:24 PM, Hau  wrote:
> Hi Chris,
>
> Could you test following patch?
>
>  DECLARE_RTL_COND(rtl_ocp_tx_cond)
>  {
> void __iomem *ioaddr = tp->mmio_addr;
>
> -   return RTL_R8(IBISR0) & 0x02;
> +   return RTL_R8(IBISR0) & 0x20;
>  }
>
>  static void rtl8168ep_stop_cmac(struct rtl8169_private *tp)
>  {
> void __iomem *ioaddr = tp->mmio_addr;
>
> RTL_W8(IBCR2, RTL_R8(IBCR2) & ~0x01);
> -   rtl_msleep_loop_wait_low(tp, &rtl_ocp_tx_cond, 50, 2000);
> +   rtl_msleep_loop_wait_high(tp, &rtl_ocp_tx_cond, 50, 2000);
> RTL_W8(IBISR0, RTL_R8(IBISR0) | 0x20);
> RTL_W8(IBCR0, RTL_R8(IBCR0) & ~0x01);
>  }
>
> Thanks.
>

Yes. It completes the initialization in 70 ms. So it means the rtl_ocp_tx_cond
are waiting for incorrect register bit? Can you help work out a patch for this?

Chris


>> -Original Message-
>> From: Chris Chiu [mailto:c...@endlessm.com]
>> Sent: Monday, January 29, 2018 6:12 PM
>> To: nic_swsd ; netdev@vger.kernel.org; Linux
>> Kernel ; Linux Upstreaming Team
>> 
>> Subject: Re: r8169 take too long to complete driver initialization
>>
>> On Fri, Jan 5, 2018 at 10:17 AM, Chris Chiu  wrote:
>> > On Wed, Dec 20, 2017 at 4:41 PM, Chris Chiu  wrote:
>> >> Hi,
>> >> We've hit a suspend/resume issue on a Acer desktop caused by
>> >> r8169 driver. The dmseg
>> >> https://gist.github.com/mschiu77/b741849b5070281daaead8dfee312d1a
>> >> shows it's still in msleep() within a mutex lock.
>> >> After looking into the code, it's caused by the
>> >> rtl8168ep_stop_cmac() which is waiting 100 seconds for
>> >> rtl_ocp_tx_cond. The following dmesg states that the r8169 driver is
>> >> loaded.
>> >>
>> >> [   20.270526] r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
>> >>
>> >> But it takes > 100 seconds to get the following messages
>> >>
>> >> [  140.400223] r8169 :02:00.0 (unnamed net_device)
>> >> (uninitialized): rtl_ocp_tx_cond == 1 (loop: 2000, delay: 50).
>> >> [  140.413294] r8169 :02:00.0 eth0: RTL8168ep/8111ep at
>> >> 0xb16c80db1000, f8:0f:41:ea:74:0d, XID 10200800 IRQ 46 [
>> >> 140.413297] r8169 :02:00.0 eth0: jumbo features [frames: 9200
>> >> bytes, tx checksumming: ko]
>> >>
>> >> So any trial to suspend the machine during this period would always
>> >> get device/resource busy message then abort. Is this  rtl_ocp_tx_cond
>> >> necessary? Because the ethernet is still working and I don't see any
>> >> problem. I don't know it should be considered normal or not. Please
>> >> let me know if any more information required. Thanks
>> >>
>> >> Chris
>> >
>> > gentle ping,
>> >
>> > cheers.
>>
>> Hi,
>> Just found a r8168 driver which seems to be authrized by realtek for 
>> cross
>> comparison. I tried applying the patch to latest 4.15 kernel and the driver
>> done it's initialization in faily short time. The patch file is here
>> https://gist.github.com/mschiu77/fcf406e64a1a437f46cf2be643f1057d.
>>
>> In mainline r8169.c, the IBISR0 register need to be polled in the
>> rtl8168ep_stop_cmac().
>> In the patch file, there's also the same IBISR0 polling code in
>> Dash2DisableTx(), but it's been bypassed since the chipset maches
>> HW_DASH_SUPPORT_TYPE_2.
>> Per the rtl_chip_info[] in r8168_n.c, CFG_METHOD_23/27/28 are
>> HW_DASH_SUPPORT_TYPE_2, and they happens to be the only 3 named
>> RTL8168EP/8111EP in the rtl_chip_info[].
>>
>> To find the same matches in r8169.c, RTL_GIGA_MAC_VER_49/50/51
>> seems share the same config. Can anyone clarify if the rtl_ocp_tx_cond()
>> really necessary for 8168EP/8111EP?
>> Or we can just ignore the condition check for RTL_GIGA_MAC_VER_49/50/51?
>>
>> Chris
>>
>> --Please consider the environment before printing this e-mail.


Re: [PATCH v3 1/2] net: create skb_gso_validate_mac_len()

2018-01-30 Thread Daniel Axtens
Marcelo Ricardo Leitner  writes:

> Hi,
>
> On Tue, Jan 30, 2018 at 12:14:46PM +1100, Daniel Axtens wrote:
>> If you take a GSO skb, and split it into packets, will the MAC
>> length (L2 + L3 + L4 headers + payload) of those packets be small
>> enough to fit within a given length?
>> 
>> Move skb_gso_mac_seglen() to skbuff.h with other related functions
>> like skb_gso_network_seglen() so we can use it, and then create
>> skb_gso_validate_mac_len to do the full calculation.
>> 
>> Signed-off-by: Daniel Axtens 
>> ---
>>  include/linux/skbuff.h | 16 +
>>  net/core/skbuff.c  | 65 
>> +++---
>>  net/sched/sch_tbf.c| 10 
>>  3 files changed, 67 insertions(+), 24 deletions(-)
>> 
>> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
>> index b8e0da6c27d6..242d6773c7c2 100644
>> --- a/include/linux/skbuff.h
>> +++ b/include/linux/skbuff.h
>> @@ -3287,6 +3287,7 @@ int skb_shift(struct sk_buff *tgt, struct sk_buff 
>> *skb, int shiftlen);
>>  void skb_scrub_packet(struct sk_buff *skb, bool xnet);
>>  unsigned int skb_gso_transport_seglen(const struct sk_buff *skb);
>>  bool skb_gso_validate_mtu(const struct sk_buff *skb, unsigned int mtu);
>> +bool skb_gso_validate_mac_len(const struct sk_buff *skb, unsigned int len);
>>  struct sk_buff *skb_segment(struct sk_buff *skb, netdev_features_t 
>> features);
>>  struct sk_buff *skb_vlan_untag(struct sk_buff *skb);
>>  int skb_ensure_writable(struct sk_buff *skb, int write_len);
>> @@ -4120,6 +4121,21 @@ static inline unsigned int 
>> skb_gso_network_seglen(const struct sk_buff *skb)
>>  return hdr_len + skb_gso_transport_seglen(skb);
>>  }
>>  
>> +/**
>> + * skb_gso_mac_seglen - Return length of individual segments of a gso packet
>> + *
>> + * @skb: GSO skb
>> + *
>> + * skb_gso_mac_seglen is used to determine the real size of the
>> + * individual segments, including MAC/L2, Layer3 (IP, IPv6) and L4
>> + * headers (TCP/UDP).
>> + */
>> +static inline unsigned int skb_gso_mac_seglen(const struct sk_buff *skb)
>> +{
>> +unsigned int hdr_len = skb_transport_header(skb) - skb_mac_header(skb);
>> +return hdr_len + skb_gso_transport_seglen(skb);
>> +}
>> +
>>  /* Local Checksum Offload.
>>   * Compute outer checksum based on the assumption that the
>>   * inner checksum will be offloaded later.
>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
>> index 01e8285aea73..55d84ab7d093 100644
>> --- a/net/core/skbuff.c
>> +++ b/net/core/skbuff.c
>> @@ -4914,36 +4914,73 @@ unsigned int skb_gso_transport_seglen(const struct 
>> sk_buff *skb)
>>  EXPORT_SYMBOL_GPL(skb_gso_transport_seglen);
>>  
>>  /**
>> - * skb_gso_validate_mtu - Return in case such skb fits a given MTU
>> + * skb_gso_size_check - check the skb size, considering GSO_BY_FRAGS
>>   *
>> - * @skb: GSO skb
>> - * @mtu: MTU to validate against
>> + * There are a couple of instances where we have a GSO skb, and we
>> + * want to determine what size it would be after it is segmented.
>>   *
>> - * skb_gso_validate_mtu validates if a given skb will fit a wanted MTU
>> - * once split.
>> + * We might want to check:
>> + * -L3+L4+payload size (e.g. IP forwarding)
>> + * - L2+L3+L4+payload size (e.g. sanity check before passing to driver)
>> + *
>> + * This is a helper to do that correctly considering GSO_BY_FRAGS.
>> + *
>> + * @seg_len: The segmented length (from skb_gso_*_seglen). In the
>> + *   GSO_BY_FRAGS case this will be [header sizes + GSO_BY_FRAGS].
>> + *
>> + * @max_len: The maximum permissible length.
>> + *
>> + * Returns true if the segmented length <= max length.
>>   */
>> -bool skb_gso_validate_mtu(const struct sk_buff *skb, unsigned int mtu)
>> -{
>> +static inline bool skb_gso_size_check(const struct sk_buff *skb,
>> +  unsigned int seg_len,
>> +  unsigned int max_len) {
>>  const struct skb_shared_info *shinfo = skb_shinfo(skb);
>>  const struct sk_buff *iter;
>> -unsigned int hlen;
>> -
>> -hlen = skb_gso_network_seglen(skb);
>>  
>>  if (shinfo->gso_size != GSO_BY_FRAGS)
>> -return hlen <= mtu;
>> +return seg_len <= max_len;
>>  
>>  /* Undo this so we can re-use header sizes */
>> -hlen -= GSO_BY_FRAGS;
>> +seg_len -= GSO_BY_FRAGS;
>>  
>>  skb_walk_frags(skb, iter) {
>> -if (hlen + skb_headlen(iter) > mtu)
>> +if (seg_len + skb_headlen(iter) > max_len)
>>  return false;
>>  }
>>  
>>  return true;
>>  }
>> -EXPORT_SYMBOL_GPL(skb_gso_validate_mtu);
>> +
>> +/**
>> + * skb_gso_validate_mtu - Return in case such skb fits a given MTU
>> + *
>> + * @skb: GSO skb
>> + * @mtu: MTU to validate against
>> + *
>> + * skb_gso_validate_mtu validates if a given skb will fit a wanted MTU
>> + * once split.
>> + */
>> +bool skb_gso_validate_mtu(const struct sk_buff *skb, unsigned int mtu)
>> +{
>> +return skb_gso_size_check(skb,

[PATCH net,stable] qmi_wwan: Add support for Quectel EP06

2018-01-30 Thread Kristian Evensen
The Quectel EP06 is a Cat. 6 LTE modem. It uses the same interface as
the EC20/EC25 for QMI, and requires the same "set DTR"-quirk to work.

Signed-off-by: Kristian Evensen 
---
 drivers/net/usb/qmi_wwan.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/usb/qmi_wwan.c b/drivers/net/usb/qmi_wwan.c
index ae0580b577b8..76ac48095c29 100644
--- a/drivers/net/usb/qmi_wwan.c
+++ b/drivers/net/usb/qmi_wwan.c
@@ -1245,6 +1245,7 @@ static const struct usb_device_id products[] = {
{QMI_QUIRK_SET_DTR(0x2c7c, 0x0125, 4)}, /* Quectel EC25, EC20 R2.0  
Mini PCIe */
{QMI_QUIRK_SET_DTR(0x2c7c, 0x0121, 4)}, /* Quectel EC21 Mini PCIe */
{QMI_FIXED_INTF(0x2c7c, 0x0296, 4)},/* Quectel BG96 */
+   {QMI_QUIRK_SET_DTR(0x2c7c, 0x0306, 4)}, /* Quectel EP06 Mini PCIe */
 
/* 4. Gobi 1000 devices */
{QMI_GOBI1K_DEVICE(0x05c6, 0x9212)},/* Acer Gobi Modem Device */
-- 
2.14.1



Re: Re: [RFC] net: qcom/emac: mdiobus-dev fwnode should point to emac-adev

2018-01-30 Thread Andrew Lunn
On Fri, Jan 26, 2018 at 07:20:42AM +, Wang, Dongsheng wrote:
> Hi, Timur && Andrew,
> 
> Please correct me if there is any problem with my understanding.
> 
> GPIO is a general property of devices, the property point to
> an entity such as device tree or ACPI table, we also can directly
> implement it in device node.
> 
> For ACPI, there is _DSD that should include GPIO property if we need it.
> No matter which devices implement it, MAC or MDIO also can implement a 
> _DSD.
> We can explicitly define a GPIO property in MDIO, but I think this
> may conflict with the existing definition of ACPI. ACPI guys may not
> agree to do this because there already has _DSD. We just need to use
> _DSD to notify GPIO layer there may have a Device Specific Data for
> this device.
> 
> So far MDIO owns external PHY "reset" as an optional feature and MDIO
> is integrated in MAC, so we need to point the MAC adev to 
> MDIO->dev.fwnode.
> And most importantly this feature does not depend on SoC, this feature
> depends on MotherBoard design.

Hi Wang

We have two different GPIO reset lines within the MDIO/PHY layer.  If
the GPIO is at the MDIO bus node of DT, the reset applies to all PHYs
connected to the bus. This is the code in__mdiobus_register().  If the
GPIO is in the PHY node, the reset applies to just that PHY. This is
the code in mdiobus_register_gpiod() & mdio_device_reset().

These resets are used at different times. The MDIO reset is used once,
before probing the MDIO bus. The PHY reset is used before probing the
individual PHY.

Does your GPIO reset one PHY, or all the PHYs? This determines where
in belongs. Is it the ACPI device which represents the MDIO bus, or
the ACPI device which represents the PHY?

You need to extend the functions i listed above to look in your ACPI
tables to find the _DSD properties which include the GPIO information.

Please work with Marcin Wojtas to achieve this. We will not accept
anything other than a generic solution which works for everybody.

 Andrew


Re: [PATCH v6 07/36] nds32: Exception handling

2018-01-30 Thread Arnd Bergmann
On Tue, Jan 30, 2018 at 11:01 AM, Vincent Chen  wrote:
> 2018-01-24 19:10 GMT+08:00 Arnd Bergmann :
>> On Wed, Jan 24, 2018 at 12:09 PM, Arnd Bergmann  wrote:
>>> On Wed, Jan 24, 2018 at 11:53 AM, Vincent Chen  wrote:
 2018-01-18 18:14 GMT+08:00 Arnd Bergmann :
>>
>>> Ok. I still wonder about the kernel part of this though: is it a good idea
>>> for user space to configure whether the kernel does unaligned
>>> accesses? I would think that the kernel should just be fixed in such
>>> a case.
>>
>> To clarify: I'm asking only about unaligned accesses from kernel code itself,
>> which is generally considered a bug when
>> CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS is disabled.
>>
>>   Arnd
>
> Thanks for your comments.
>
> For performance, we decide always disable
> CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS even if hardware supports
> unaligned accessing. Therefore, I will remove kernel unaligned accessing from
> nds32/mm/alignment.c. In other words, alignment.c only addresses unaligned
> accessing for user space.

I'm not really following that logic, let's go through that again so I understand
the situation better.

CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS should be set if and
only if you have a CPU that does not need to trap on unaligned accesses.

What are the hardware capabilities on nds32? Do you have all three
categories:

a) some CPUs that always trap on unaligned access
b) some CPUs that never trap on unaligned access
c) some CPUs that can be configured to either trap or not trap by
the kernel?

Arnd


Re: possible deadlock in do_ip_setsockopt

2018-01-30 Thread Florian Westphal
#syz dup: possible deadlock in do_ip_getsockopt


Re: possible deadlock in do_ipv6_setsockopt

2018-01-30 Thread Florian Westphal
#syz dup: possible deadlock in do_ip_getsockopt


Re: [netfilter-core] kernel panic: Out of memory and no killable processes... (2)

2018-01-30 Thread Michal Hocko
On Tue 30-01-18 10:57:39, Michal Hocko wrote:
> On Tue 30-01-18 10:02:34, Dmitry Vyukov wrote:
> > On Tue, Jan 30, 2018 at 9:28 AM, Kirill A. Shutemov
> >  wrote:
> > > On Tue, Jan 30, 2018 at 09:11:27AM +0100, Florian Westphal wrote:
> > >> Michal Hocko  wrote:
> > >> > On Mon 29-01-18 23:35:22, Florian Westphal wrote:
> > >> > > Kirill A. Shutemov  wrote:
> > >> > [...]
> > >> > > > I hate what I'm saying, but I guess we need some tunable here.
> > >> > > > Not sure what exactly.
> > >> > >
> > >> > > Would memcg help?
> > >> >
> > >> > That really depends. I would have to check whether vmalloc path obeys
> > >> > __GFP_ACCOUNT (I suspect it does except for page tables allocations but
> > >> > that shouldn't be a big deal). But then the other potential problem is
> > >> > the life time of the xt_table_info (or other potentially large) data
> > >> > structures. Are they bound to any process life time.
> > >>
> > >> No.
> > >
> > > Well, IIUC they bound to net namespace life time, so killing all
> > > proccesses in the namespace would help to get memory back. :)
> > 
> > ... unless the namespace is mounted into file system.
> > 
> > Let's start with NOWARN as that's what kernel generally uses for
> > allocations with user-controllable size. ENOMEM is roughly as
> > informative as the WARNING message in this case.
> 
> You want __GFP_NORETRY but that is not _fully_ supported by kvmalloc
> right now. More specifically kvmalloc doesn't guanratee that the request
> will not trigger the OOM killer (like regular __GFP_NORETRY). This is
> because of internal vmalloc restrictions. If you are however OK to
> simply bail out in most cases then __GFP_NORETRY should work reasonably
> fine.
> 
> > I think we also need to consider setting up memory cgroup for
> > syzkaller test processes (we do RLIMIT_AS, but that's weak).
> 
> Well, this is not about syzkaller, it merely pointed out a potential
> DoS... And that has to be addressed somehow.

So how about this?
---
>From d48e950f1b04f234b57b9e34c363bdcfec10aeee Mon Sep 17 00:00:00 2001
From: Michal Hocko 
Date: Tue, 30 Jan 2018 14:51:07 +0100
Subject: [PATCH] net/netfilter/x_tables.c: make allocation less aggressive

syzbot has noticed that xt_alloc_table_info can allocate a lot of
memory. This is an admin only interface but an admin in a namespace
is sufficient as well. eacd86ca3b03 ("net/netfilter/x_tables.c: use
kvmalloc() in xt_alloc_table_info()") has changed the opencoded
kmalloc->vmalloc fallback into kvmalloc. It has dropped __GFP_NORETRY on
the way because vmalloc has simply never fully supported __GFP_NORETRY
semantic. This is still the case because e.g. page tables backing the
vmalloc area are hardcoded GFP_KERNEL.

Revert back to __GFP_NORETRY as a poors man defence against excessively
large allocation request here. We will not rule out the OOM killer
completely but __GFP_NORETRY should at least stop the large request
in most cases.

Fixes: eacd86ca3b03 ("net/netfilter/x_tables.c: use kvmalloc() in 
xt_alloc_table_info()")
Signed-off-by: Michal Hocko 
---
 net/netfilter/x_tables.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
index d8571f414208..a5f5c29bcbdc 100644
--- a/net/netfilter/x_tables.c
+++ b/net/netfilter/x_tables.c
@@ -1003,7 +1003,13 @@ struct xt_table_info *xt_alloc_table_info(unsigned int 
size)
if ((SMP_ALIGN(size) >> PAGE_SHIFT) + 2 > totalram_pages)
return NULL;
 
-   info = kvmalloc(sz, GFP_KERNEL);
+   /*
+* __GFP_NORETRY is not fully supported by kvmalloc but it should
+* work reasonably well if sz is too large and bail out rather
+* than shoot all processes down before realizing there is nothing
+* more to reclaim.
+*/
+   info = kvmalloc(sz, GFP_KERNEL | __GFP_NORETRY);
if (!info)
return NULL;
 
-- 
2.15.1

-- 
Michal Hocko
SUSE Labs


Re: [netfilter-core] kernel panic: Out of memory and no killable processes... (2)

2018-01-30 Thread Florian Westphal
> From d48e950f1b04f234b57b9e34c363bdcfec10aeee Mon Sep 17 00:00:00 2001
> From: Michal Hocko 
> Date: Tue, 30 Jan 2018 14:51:07 +0100
> Subject: [PATCH] net/netfilter/x_tables.c: make allocation less aggressive

Acked-by: Florian Westphal 


Re: [PATCH] impr: Fix ptrdiff_t print formatting

2018-01-30 Thread David Miller
From: James Hogan 
Date: Tue, 30 Jan 2018 09:48:02 +

> ipmr_vif_seq_show() prints the difference between two pointers with the
> format string %2zd (z for size_t), however the correct format string is
> %2td instead (t for ptrdiff_t).
> 
> The same bug in ip6mr_vif_seq_show() was already fixed long ago by
> commit d430a227d272 ("bogus format in ip6mr").
> 
> Signed-off-by: James Hogan 

Applied, thanks.


[PATCH] netfilter: fix out-of-bounds accesses in clusterip_tg_check()

2018-01-30 Thread Dmitry Vyukov
Commit 136e92bbec0a switched local_nodes from an array to a bitmask
but did not add proper bounds checks. As the result
clusterip_config_init_nodelist() can both over-read
ipt_clusterip_tgt_info.local_nodes and over-write
clusterip_config.local_nodes.

Add bounds checks for both.

Signed-off-by: Dmitry Vyukov 
Fixes: 136e92bbec0a ("[NETFILTER] CLUSTERIP: use a bitmap to store node 
responsibility data")
Reported-by: syzbot 
Cc: netfilter-de...@vger.kernel.org
Cc: coret...@netfilter.org
Cc: netdev@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Cc: syzkal...@googlegroups.com
---
 net/ipv4/netfilter/ipt_CLUSTERIP.c | 16 +---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/netfilter/ipt_CLUSTERIP.c 
b/net/ipv4/netfilter/ipt_CLUSTERIP.c
index 69060e3abe85..1e4a7209a3d2 100644
--- a/net/ipv4/netfilter/ipt_CLUSTERIP.c
+++ b/net/ipv4/netfilter/ipt_CLUSTERIP.c
@@ -431,7 +431,7 @@ static int clusterip_tg_check(const struct xt_tgchk_param 
*par)
struct ipt_clusterip_tgt_info *cipinfo = par->targinfo;
const struct ipt_entry *e = par->entryinfo;
struct clusterip_config *config;
-   int ret;
+   int ret, i;
 
if (par->nft_compat) {
pr_err("cannot use CLUSTERIP target from nftables compat\n");
@@ -450,8 +450,18 @@ static int clusterip_tg_check(const struct xt_tgchk_param 
*par)
pr_info("Please specify destination IP\n");
return -EINVAL;
}
-
-   /* FIXME: further sanity checks */
+   if (cipinfo->num_local_nodes > ARRAY_SIZE(cipinfo->local_nodes)) {
+   pr_info("bad num_local_nodes %u\n", cipinfo->num_local_nodes);
+   return -EINVAL;
+   }
+   for (i = 0; i < cipinfo->num_local_nodes; i++) {
+   if (cipinfo->local_nodes[i] - 1 >=
+   sizeof(config->local_nodes) * 8) {
+   pr_info("bad local_nodes[%d] %u\n",
+   i, cipinfo->local_nodes[i]);
+   return -EINVAL;
+   }
+   }
 
config = clusterip_config_find_get(par->net, e->ip.dst.s_addr, 1);
if (!config) {
-- 
2.16.0.rc1.238.g530d649a79-goog



Re: [netfilter-core] kernel panic: Out of memory and no killable processes... (2)

2018-01-30 Thread Michal Hocko
On Tue 30-01-18 15:01:11, Florian Westphal wrote:
> > From d48e950f1b04f234b57b9e34c363bdcfec10aeee Mon Sep 17 00:00:00 2001
> > From: Michal Hocko 
> > Date: Tue, 30 Jan 2018 14:51:07 +0100
> > Subject: [PATCH] net/netfilter/x_tables.c: make allocation less aggressive
> 
> Acked-by: Florian Westphal 

Thanks! How should we route this change? Andrew, David?

-- 
Michal Hocko
SUSE Labs


Re: [PATCH v6 07/36] nds32: Exception handling

2018-01-30 Thread Greentime Hu
Hi, Arnd:

2018-01-30 21:33 GMT+08:00 Arnd Bergmann :
> On Tue, Jan 30, 2018 at 11:01 AM, Vincent Chen  wrote:
>> 2018-01-24 19:10 GMT+08:00 Arnd Bergmann :
>>> On Wed, Jan 24, 2018 at 12:09 PM, Arnd Bergmann  wrote:
 On Wed, Jan 24, 2018 at 11:53 AM, Vincent Chen  wrote:
> 2018-01-18 18:14 GMT+08:00 Arnd Bergmann :
>>>
 Ok. I still wonder about the kernel part of this though: is it a good idea
 for user space to configure whether the kernel does unaligned
 accesses? I would think that the kernel should just be fixed in such
 a case.
>>>
>>> To clarify: I'm asking only about unaligned accesses from kernel code 
>>> itself,
>>> which is generally considered a bug when
>>> CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS is disabled.
>>>
>>>   Arnd
>>
>> Thanks for your comments.
>>
>> For performance, we decide always disable
>> CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS even if hardware supports
>> unaligned accessing. Therefore, I will remove kernel unaligned accessing from
>> nds32/mm/alignment.c. In other words, alignment.c only addresses unaligned
>> accessing for user space.
>
> I'm not really following that logic, let's go through that again so I 
> understand
> the situation better.
>
> CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS should be set if and
> only if you have a CPU that does not need to trap on unaligned accesses.
>
> What are the hardware capabilities on nds32? Do you have all three
> categories:
>
> a) some CPUs that always trap on unaligned access
> b) some CPUs that never trap on unaligned access
> c) some CPUs that can be configured to either trap or not trap by
> the kernel?
>
We have type a and c.
We use CONFIG_ALIGNMENT_TRAP for a and
CONFIG_HW_SUPPORT_UNALIGNMENT_ACCESS for c.

Since unaligned access in kernel code itself should be considered as a
bug, we will remove the emulation code to handle the kernel code
unaligned accessed case.
We think CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS and
CONFIG_HW_SUPPORT_UNALIGNMENT_ACCESS have different purposes because
it will still be more efficient to access by byte even if hardware
support unaligned access.
CONFIG_HW_SUPPORT_UNALIGNMENT_ACCESS is used to prevent generating
unaligned access exception.

Thus, we will
1. treat unaligned access in kernel code itself as a bug
2. don't select CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
3. disable CONFIG_HW_SUPPORT_UNALIGNMENT_ACCESS as default


[PATCH] esp4: remove redundant initialization of pointer esph

2018-01-30 Thread Colin King
From: Colin Ian King 

Pointer esph is being assigned a value that is never read, esph is
re-assigned and only read inside an if statement, hence the
initialization is redundant and can be removed.

Cleans up clang warning:
net/ipv4/esp4.c:657:21: warning: Value stored to 'esph' during
its initialization is never read

Signed-off-by: Colin Ian King 
---
 net/ipv4/esp4.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv4/esp4.c b/net/ipv4/esp4.c
index 296d0b956bfe..97689012b357 100644
--- a/net/ipv4/esp4.c
+++ b/net/ipv4/esp4.c
@@ -654,7 +654,7 @@ static void esp_input_restore_header(struct sk_buff *skb)
 static void esp_input_set_header(struct sk_buff *skb, __be32 *seqhi)
 {
struct xfrm_state *x = xfrm_input_state(skb);
-   struct ip_esp_hdr *esph = (struct ip_esp_hdr *)skb->data;
+   struct ip_esp_hdr *esph;
 
/* For ESN we move the header forward by 4 bytes to
 * accomodate the high bits.  We will move it back after
-- 
2.15.1



Re: after adding > 200vlans to mlx nic no traffic

2018-01-30 Thread Gal Pressman
On 30-Jan-18 02:29, Paweł Staszewski wrote:
> Weird thing with mellanox mlx5 (connectx-4) kernel 4.15-rc9 - from net-next 
> davem tree
> 
> 
> 
> after:
> 
> ip link add link enp175s0f1 name vlan1538 type vlan id 1538
> 
> ip link set up dev vlan1538
> 
> 
> traffic on vlan is working
> 
> 
> But after
> 
> VID="1160 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 
> 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 
> 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 
> 1493 1494 1495 1496 1497 1498 1499 150
> 0 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 
> 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 
> 1531 1532 1534 1535 1394 1393 1550 1500 1526 1536 1537 1538 1539 1540 1542 
> 1541 1543 1544 1801 1546 1547 1548 1
> 549 1735 3132 3143 3104 3125 3103 3115 3134 3105 3113 3141 4009 3144 3130 
> 1803 3146 3148 3109 1551 1552 1553 1554 1555 1556 1558 1559 1560 1561 1562 
> 1563 1564 1565 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 
> 1579 1580 1581 1582 1583 1584 1585 1586
>  1587 1588 1589 1591 1592 1593 1594 1595 1596 1597 1598 1599 1557 1545 2001 
> 250 4043 1806 1600 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 
> 1613 1614 1615 1616 1617 1618 1619 1620 1621 1625 1626 1627 1628 1629 1630 
> 1631 1632 1634 1635 1636 1640 1641 164
> 2 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 
> 1658 1659 1660 1661 1662 1663 1664 1665 1601 1666 1667 1668 1669 1670 1671 
> 1672 1673 1674 1676 1677 1678 1680 1681 1682 1683 1684 1685 1686 1687 1688 
> 1689 1690 1691 1692 1693 1694 1696 1
> 697 1698 1712 1817 1869 1810 1814 1818 1855 1856 1857 1858 1859 1860 1861 
> 1862 1863 1864 1865 1866 1867 1868 1870 1871 1872 1873 1874 1875 1876 1877 
> 1878 1879 1880 1885 1890 1891 1892 1893 1894 1895 1898 1881 2190 2191 2192 
> 2193 2194 2195 2196 2197 2198 2199 2541
>  2542 2543 2544 2545 2546 2547 2548 2549 2550 2290"
> for i in $VID
> do
>     ip link add link enp175s0f1 name vlan$i type vlan id $i
> done
> 
> 
> And setting vlan 1538 up - there is no received traffic on this vlan.
> 
> 
> 
> So searching for broken things (last time same problem was with ixgbe)
> 
> ethtool -K enp175s0f1 rx-vlan-filter off
> 
> 
> And all vlans attached to this device start working
> 
> 
> 
Hi Pawel,
I tried to reproduce the issue in our local setups without success.
Can you please provide more information? are there any errors in dmesg? did you 
configure anything else that might be relevant to this issue?
Do you know if this is a new degradation to 4.15-rc9?

Try to send traffic over the vlans and sample the ethtool counters (ethtool -S 
enp175s0f1) of the receiver mlx5 interface over time,
this might help us trace where the packets drop.

Thank you for reporting this,
Gal


Re: [PATCH v6 07/36] nds32: Exception handling

2018-01-30 Thread Arnd Bergmann
On Tue, Jan 30, 2018 at 3:49 PM, Greentime Hu  wrote:
> Hi, Arnd:
>
> 2018-01-30 21:33 GMT+08:00 Arnd Bergmann :
>> On Tue, Jan 30, 2018 at 11:01 AM, Vincent Chen  wrote:
>>> 2018-01-24 19:10 GMT+08:00 Arnd Bergmann :
 On Wed, Jan 24, 2018 at 12:09 PM, Arnd Bergmann  wrote:
> On Wed, Jan 24, 2018 at 11:53 AM, Vincent Chen  
> wrote:
>> 2018-01-18 18:14 GMT+08:00 Arnd Bergmann :

> Ok. I still wonder about the kernel part of this though: is it a good idea
> for user space to configure whether the kernel does unaligned
> accesses? I would think that the kernel should just be fixed in such
> a case.

 To clarify: I'm asking only about unaligned accesses from kernel code 
 itself,
 which is generally considered a bug when
 CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS is disabled.

   Arnd
>>>
>>> Thanks for your comments.
>>>
>>> For performance, we decide always disable
>>> CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS even if hardware supports
>>> unaligned accessing. Therefore, I will remove kernel unaligned accessing 
>>> from
>>> nds32/mm/alignment.c. In other words, alignment.c only addresses unaligned
>>> accessing for user space.
>>
>> I'm not really following that logic, let's go through that again so I 
>> understand
>> the situation better.
>>
>> CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS should be set if and
>> only if you have a CPU that does not need to trap on unaligned accesses.
>>
>> What are the hardware capabilities on nds32? Do you have all three
>> categories:
>>
>> a) some CPUs that always trap on unaligned access
>> b) some CPUs that never trap on unaligned access
>> c) some CPUs that can be configured to either trap or not trap by
>> the kernel?
>>
> We have type a and c.
> We use CONFIG_ALIGNMENT_TRAP for a and
> CONFIG_HW_SUPPORT_UNALIGNMENT_ACCESS for c.

Ok, got it.

> Since unaligned access in kernel code itself should be considered as a
> bug, we will remove the emulation code to handle the kernel code
> unaligned accessed case.
> We think CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS and
> CONFIG_HW_SUPPORT_UNALIGNMENT_ACCESS have different purposes because
> it will still be more efficient to access by byte even if hardware
> support unaligned access.
> CONFIG_HW_SUPPORT_UNALIGNMENT_ACCESS is used to prevent generating
> unaligned access exception.

Hmm, this is a bit tricky. Most architectures actually assume that those
two are the same, and nothing else has a
HW_SUPPORT_UNALIGNMENT_ACCESS option.

We do actually have a related problem on 32-bit ARM, where the current
generation of processors (ARMv6 and higher) support unaligned
accesses for almost all instructions with the exception of those
instructions that operate on multiple memory locations (ldm/stm
and ldrd/strd). We can control the use of those instructions in inline
assembler, and gcc never uses them when it knows that a pointer
is unaligned, but when CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
is set, the kernel sometimes intentionally contains code sequences
that lead the compiler to believe that a variable is aligned when it
is not, so we end up needing a trap handler here.

We might at some point want to clean this up by going through
all uses of CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
and changing them in a way that leads to better results on both
arm32 and nds32.

   Arnd


Re: [PATCH net-next 1/1] rtnetlink: enable IFLA_IF_NETNSID for RTM_NEWLINK

2018-01-30 Thread kbuild test robot
Hi Christian,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on net-next/master]

url:
https://github.com/0day-ci/linux/commits/Christian-Brauner/rtnetlink-enable-IFLA_IF_NETNSID-for-RTM_NEWLINK/20180130-230918
config: ia64-allmodconfig (attached as .config)
compiler: ia64-linux-gcc (GCC) 7.2.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=ia64 

All error/warnings (new ones prefixed by >>):

   net//core/rtnetlink.c: In function 'rtnl_newlink':
>> net//core/rtnetlink.c:2903:14: error: implicit declaration of function 
>> 'rtnl_link_get_net_capable'; did you mean 'rtnl_link_get_net'? 
>> [-Werror=implicit-function-declaration]
  dest_net = rtnl_link_get_net_capable(skb, net, tb, CAP_NET_ADMIN);
 ^
 rtnl_link_get_net
>> net//core/rtnetlink.c:2903:12: warning: assignment makes pointer from 
>> integer without a cast [-Wint-conversion]
  dest_net = rtnl_link_get_net_capable(skb, net, tb, CAP_NET_ADMIN);
   ^
   cc1: some warnings being treated as errors

vim +2903 net//core/rtnetlink.c

  2729  
  2730  static int rtnl_newlink(struct sk_buff *skb, struct nlmsghdr *nlh,
  2731  struct netlink_ext_ack *extack)
  2732  {
  2733  struct net *net = sock_net(skb->sk);
  2734  const struct rtnl_link_ops *ops;
  2735  const struct rtnl_link_ops *m_ops = NULL;
  2736  struct net_device *dev;
  2737  struct net_device *master_dev = NULL;
  2738  struct ifinfomsg *ifm;
  2739  char kind[MODULE_NAME_LEN];
  2740  char ifname[IFNAMSIZ];
  2741  struct nlattr *tb[IFLA_MAX+1];
  2742  struct nlattr *linkinfo[IFLA_INFO_MAX+1];
  2743  unsigned char name_assign_type = NET_NAME_USER;
  2744  int err;
  2745  
  2746  #ifdef CONFIG_MODULES
  2747  replay:
  2748  #endif
  2749  err = nlmsg_parse(nlh, sizeof(*ifm), tb, IFLA_MAX, ifla_policy, 
extack);
  2750  if (err < 0)
  2751  return err;
  2752  
  2753  if (tb[IFLA_IF_NETNSID])
  2754  return -EOPNOTSUPP;
  2755  
  2756  if (tb[IFLA_IFNAME])
  2757  nla_strlcpy(ifname, tb[IFLA_IFNAME], IFNAMSIZ);
  2758  else
  2759  ifname[0] = '\0';
  2760  
  2761  ifm = nlmsg_data(nlh);
  2762  if (ifm->ifi_index > 0)
  2763  dev = __dev_get_by_index(net, ifm->ifi_index);
  2764  else {
  2765  if (ifname[0])
  2766  dev = __dev_get_by_name(net, ifname);
  2767  else
  2768  dev = NULL;
  2769  }
  2770  
  2771  if (dev) {
  2772  master_dev = netdev_master_upper_dev_get(dev);
  2773  if (master_dev)
  2774  m_ops = master_dev->rtnl_link_ops;
  2775  }
  2776  
  2777  err = validate_linkmsg(dev, tb);
  2778  if (err < 0)
  2779  return err;
  2780  
  2781  if (tb[IFLA_LINKINFO]) {
  2782  err = nla_parse_nested(linkinfo, IFLA_INFO_MAX,
  2783 tb[IFLA_LINKINFO], 
ifla_info_policy,
  2784 NULL);
  2785  if (err < 0)
  2786  return err;
  2787  } else
  2788  memset(linkinfo, 0, sizeof(linkinfo));
  2789  
  2790  if (linkinfo[IFLA_INFO_KIND]) {
  2791  nla_strlcpy(kind, linkinfo[IFLA_INFO_KIND], 
sizeof(kind));
  2792  ops = rtnl_link_ops_get(kind);
  2793  } else {
  2794  kind[0] = '\0';
  2795  ops = NULL;
  2796  }
  2797  
  2798  if (1) {
  2799  struct nlattr *attr[ops ? ops->maxtype + 1 : 1];
  2800  struct nlattr *slave_attr[m_ops ? m_ops->slave_maxtype 
+ 1 : 1];
  2801  struct nlattr **data = NULL;
  2802  struct nlattr **slave_data = NULL;
  2803  struct net *dest_net, *link_net = NULL;
  2804  
  2805  if (ops) {
  2806  if (ops->maxtype && linkinfo[IFLA_INFO_DATA]) {
  2807  err = nla_parse_nested(attr, 
ops->maxtype,
  2808 
linkinfo[IFLA_INFO_DATA],
  2809 ops->policy, 
NULL);
  2810  if (err < 0)
  2811  re

Re: after adding > 200vlans to mlx nic no traffic

2018-01-30 Thread Paweł Staszewski



W dniu 30.01.2018 o 15:57, Gal Pressman pisze:

On 30-Jan-18 02:29, Paweł Staszewski wrote:

Weird thing with mellanox mlx5 (connectx-4) kernel 4.15-rc9 - from net-next 
davem tree



after:

ip link add link enp175s0f1 name vlan1538 type vlan id 1538

ip link set up dev vlan1538


traffic on vlan is working


But after

VID="1160 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 
1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 
1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 
1498 1499 150
0 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 
1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 
1532 1534 1535 1394 1393 1550 1500 1526 1536 1537 1538 1539 1540 1542 1541 1543 
1544 1801 1546 1547 1548 1
549 1735 3132 3143 3104 3125 3103 3115 3134 3105 3113 3141 4009 3144 3130 1803 
3146 3148 3109 1551 1552 1553 1554 1555 1556 1558 1559 1560 1561 1562 1563 1564 
1565 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 
1582 1583 1584 1585 1586
  1587 1588 1589 1591 1592 1593 1594 1595 1596 1597 1598 1599 1557 1545 2001 
250 4043 1806 1600 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 
1614 1615 1616 1617 1618 1619 1620 1621 1625 1626 1627 1628 1629 1630 1631 1632 
1634 1635 1636 1640 1641 164
2 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 
1658 1659 1660 1661 1662 1663 1664 1665 1601 1666 1667 1668 1669 1670 1671 1672 
1673 1674 1676 1677 1678 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 
1691 1692 1693 1694 1696 1
697 1698 1712 1817 1869 1810 1814 1818 1855 1856 1857 1858 1859 1860 1861 1862 
1863 1864 1865 1866 1867 1868 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 
1880 1885 1890 1891 1892 1893 1894 1895 1898 1881 2190 2191 2192 2193 2194 2195 
2196 2197 2198 2199 2541
  2542 2543 2544 2545 2546 2547 2548 2549 2550 2290"
for i in $VID
do
     ip link add link enp175s0f1 name vlan$i type vlan id $i
done


And setting vlan 1538 up - there is no received traffic on this vlan.



So searching for broken things (last time same problem was with ixgbe)

ethtool -K enp175s0f1 rx-vlan-filter off


And all vlans attached to this device start working




Hi Pawel,
I tried to reproduce the issue in our local setups without success.
Can you please provide more information? are there any errors in dmesg? did you 
configure anything else that might be relevant to this issue?
Do you know if this is a new degradation to 4.15-rc9?

previous kernel used was 4.13.2 - without this problem.

current kernel is net-next 4.15.0-rc9+
https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git


Try to send traffic over the vlans and sample the ethtool counters (ethtool -S 
enp175s0f1) of the receiver mlx5 interface over time,
this might help us trace where the packets drop.
Yes traffic is going out from interface - bot there is nothing on RX - 
tcpdump shows no packets arriving to interface






Thank you for reporting this,
Gal



Interface settings:
(working case with rx vlan filter turned off)
ethtool -k enp175s0f1
Features for enp175s0f1:
rx-checksumming: on
tx-checksumming: on
    tx-checksum-ipv4: on
    tx-checksum-ip-generic: off [fixed]
    tx-checksum-ipv6: on
    tx-checksum-fcoe-crc: off [fixed]
    tx-checksum-sctp: off [fixed]
scatter-gather: on
    tx-scatter-gather: on
    tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: on
    tx-tcp-segmentation: on
    tx-tcp-ecn-segmentation: off [fixed]
    tx-tcp-mangleid-segmentation: off
    tx-tcp6-segmentation: on
udp-fragmentation-offload: off
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: off
receive-hashing: on
highdma: on [fixed]
rx-vlan-filter: off
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: on
tx-gre-csum-segmentation: on
tx-ipxip4-segmentation: off [fixed]
tx-ipxip6-segmentation: off [fixed]
tx-udp_tnl-segmentation: on
tx-udp_tnl-csum-segmentation: on
tx-gso-partial: on
tx-sctp-segmentation: off [fixed]
tx-esp-segmentation: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off
rx-all: off
tx-vlan-stag-hw-insert: on
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: on [fixed]
l2-fwd-offload: off [fixed]
hw-tc-offload: off
esp-hw-offload: off [fixed]
esp-tx-csum-hw-offload: off [fixed]
rx-udp_tunnel-port-offload: on
rx-gro-hw: off [fixed]

Coalesce parameters for enp175s0f1:
Adaptive RX: off  TX: off
stats-block-usecs: 0
sample-interval: 0
pkt-rate-low: 0
pkt-rate-high: 0
dmac: 32571

rx-usecs: 256
rx-frames: 128
rx-usecs-irq: 0
rx-frames-irq: 0

tx-usecs: 16
tx-frames: 32
tx-usecs-irq: 0
tx-frames-irq: 0

rx-usecs-low:

Re: sctp netns "unregister_netdevice: waiting for lo to become free. Usage count = 1"

2018-01-30 Thread Neil Horman
On Mon, Jan 29, 2018 at 05:55:45PM +0200, Tommi Rantala wrote:
> Hi,
> 
> When running sctp_test from lksctp-tools in netns in 4.4 and 4.9 with
> suitable arguments, the local loopback device in the netns is not getting
> destroyed after deleting the netns.
> 
> For example:
> 
> ip netns add TEST
> ip netns exec TEST ip link set lo up
> ip link add dummy0 type dummy
> ip link add dummy1 type dummy
> ip link add dummy2 type dummy
> ip link set dev dummy0 netns TEST
> ip link set dev dummy1 netns TEST
> ip link set dev dummy2 netns TEST
> ip netns exec TEST ip addr add 192.168.1.1/24 dev dummy0
> ip netns exec TEST ip link set dummy0 up
> ip netns exec TEST ip addr add 192.168.1.2/24 dev dummy1
> ip netns exec TEST ip link set dummy1 up
> ip netns exec TEST ip addr add 192.168.1.3/24 dev dummy2
> ip netns exec TEST ip link set dummy2 up
> ip netns exec TEST sctp_test -H 192.168.1.2 -P 20002 -h 192.168.1.1 -p 2
> -s -B 192.168.1.3
> ip netns del TEST
> 
> Results to:
> 
> [  354.179591] unregister_netdevice: waiting for lo to become free. Usage
> count = 1
> [  364.419674] unregister_netdevice: waiting for lo to become free. Usage
> count = 1
> [  374.663664] unregister_netdevice: waiting for lo to become free. Usage
> count = 1
> [  384.903717] unregister_netdevice: waiting for lo to become free. Usage
> count = 1
> [  395.143724] unregister_netdevice: waiting for lo to become free. Usage
> count = 1
> [  405.383645] unregister_netdevice: waiting for lo to become free. Usage
> count = 1
> ...
> 
> Based on a quick test, 4.14 and 4.15 does not suffer from this, but its
> reproducible e.g. in 4.4.113 and 4.9.75
> 
> Any ideas?
> 
> Tommi
> 
Does the problem occur if you don't set lo up?

Neil



Re: [PATCH net-next 1/1] rtnetlink: enable IFLA_IF_NETNSID for RTM_NEWLINK

2018-01-30 Thread kbuild test robot
Hi Christian,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on net-next/master]

url:
https://github.com/0day-ci/linux/commits/Christian-Brauner/rtnetlink-enable-IFLA_IF_NETNSID-for-RTM_NEWLINK/20180130-230918
config: i386-randconfig-a1-201804 (attached as .config)
compiler: gcc-4.9 (Debian 4.9.4-2) 4.9.4
reproduce:
# save the attached .config to linux build tree
make ARCH=i386 

All errors (new ones prefixed by >>):

   net/core/rtnetlink.c: In function 'rtnl_newlink':
>> net/core/rtnetlink.c:2903:3: error: implicit declaration of function 
>> 'rtnl_link_get_net_capable' [-Werror=implicit-function-declaration]
  dest_net = rtnl_link_get_net_capable(skb, net, tb, CAP_NET_ADMIN);
  ^
   net/core/rtnetlink.c:2903:12: warning: assignment makes pointer from integer 
without a cast
  dest_net = rtnl_link_get_net_capable(skb, net, tb, CAP_NET_ADMIN);
   ^
   cc1: some warnings being treated as errors

vim +/rtnl_link_get_net_capable +2903 net/core/rtnetlink.c

  2729  
  2730  static int rtnl_newlink(struct sk_buff *skb, struct nlmsghdr *nlh,
  2731  struct netlink_ext_ack *extack)
  2732  {
  2733  struct net *net = sock_net(skb->sk);
  2734  const struct rtnl_link_ops *ops;
  2735  const struct rtnl_link_ops *m_ops = NULL;
  2736  struct net_device *dev;
  2737  struct net_device *master_dev = NULL;
  2738  struct ifinfomsg *ifm;
  2739  char kind[MODULE_NAME_LEN];
  2740  char ifname[IFNAMSIZ];
  2741  struct nlattr *tb[IFLA_MAX+1];
  2742  struct nlattr *linkinfo[IFLA_INFO_MAX+1];
  2743  unsigned char name_assign_type = NET_NAME_USER;
  2744  int err;
  2745  
  2746  #ifdef CONFIG_MODULES
  2747  replay:
  2748  #endif
  2749  err = nlmsg_parse(nlh, sizeof(*ifm), tb, IFLA_MAX, ifla_policy, 
extack);
  2750  if (err < 0)
  2751  return err;
  2752  
  2753  if (tb[IFLA_IF_NETNSID])
  2754  return -EOPNOTSUPP;
  2755  
  2756  if (tb[IFLA_IFNAME])
  2757  nla_strlcpy(ifname, tb[IFLA_IFNAME], IFNAMSIZ);
  2758  else
  2759  ifname[0] = '\0';
  2760  
  2761  ifm = nlmsg_data(nlh);
  2762  if (ifm->ifi_index > 0)
  2763  dev = __dev_get_by_index(net, ifm->ifi_index);
  2764  else {
  2765  if (ifname[0])
  2766  dev = __dev_get_by_name(net, ifname);
  2767  else
  2768  dev = NULL;
  2769  }
  2770  
  2771  if (dev) {
  2772  master_dev = netdev_master_upper_dev_get(dev);
  2773  if (master_dev)
  2774  m_ops = master_dev->rtnl_link_ops;
  2775  }
  2776  
  2777  err = validate_linkmsg(dev, tb);
  2778  if (err < 0)
  2779  return err;
  2780  
  2781  if (tb[IFLA_LINKINFO]) {
  2782  err = nla_parse_nested(linkinfo, IFLA_INFO_MAX,
  2783 tb[IFLA_LINKINFO], 
ifla_info_policy,
  2784 NULL);
  2785  if (err < 0)
  2786  return err;
  2787  } else
  2788  memset(linkinfo, 0, sizeof(linkinfo));
  2789  
  2790  if (linkinfo[IFLA_INFO_KIND]) {
  2791  nla_strlcpy(kind, linkinfo[IFLA_INFO_KIND], 
sizeof(kind));
  2792  ops = rtnl_link_ops_get(kind);
  2793  } else {
  2794  kind[0] = '\0';
  2795  ops = NULL;
  2796  }
  2797  
  2798  if (1) {
  2799  struct nlattr *attr[ops ? ops->maxtype + 1 : 1];
  2800  struct nlattr *slave_attr[m_ops ? m_ops->slave_maxtype 
+ 1 : 1];
  2801  struct nlattr **data = NULL;
  2802  struct nlattr **slave_data = NULL;
  2803  struct net *dest_net, *link_net = NULL;
  2804  
  2805  if (ops) {
  2806  if (ops->maxtype && linkinfo[IFLA_INFO_DATA]) {
  2807  err = nla_parse_nested(attr, 
ops->maxtype,
  2808 
linkinfo[IFLA_INFO_DATA],
  2809 ops->policy, 
NULL);
  2810  if (err < 0)
  2811  return err;
  2812  data = attr;
  2813  }
  2814  if (ops->validate) {
  2815  err = ops->validate(tb, data, extack);
  2816 

Re: [PATCH net-next 1/1] rtnetlink: enable IFLA_IF_NETNSID for RTM_NEWLINK

2018-01-30 Thread Christian Brauner

On Wed, Jan 31, 2018 at 12:13:11AM +0800, kbuild test robot wrote:
> Hi Christian,
> 
> Thank you for the patch! Yet something to improve:
> 
> [auto build test ERROR on net-next/master]
> 
> url:
> https://github.com/0day-ci/linux/commits/Christian-Brauner/rtnetlink-enable-IFLA_IF_NETNSID-for-RTM_NEWLINK/20180130-230918
> config: i386-randconfig-a1-201804 (attached as .config)
> compiler: gcc-4.9 (Debian 4.9.4-2) 4.9.4
> reproduce:
> # save the attached .config to linux build tree
> make ARCH=i386 
> 
> All errors (new ones prefixed by >>):
> 
>net/core/rtnetlink.c: In function 'rtnl_newlink':
> >> net/core/rtnetlink.c:2903:3: error: implicit declaration of function 
> >> 'rtnl_link_get_net_capable' [-Werror=implicit-function-declaration]
>   dest_net = rtnl_link_get_net_capable(skb, net, tb, CAP_NET_ADMIN);
>   ^

The patch is against Dave's net-next tree which already contains a
merged prior patch series from me which introduces the missing
function. So I'd say this is safe to ignore.

Thanks!
Christian

>net/core/rtnetlink.c:2903:12: warning: assignment makes pointer from 
> integer without a cast
>   dest_net = rtnl_link_get_net_capable(skb, net, tb, CAP_NET_ADMIN);
>^
>cc1: some warnings being treated as errors
> 
> vim +/rtnl_link_get_net_capable +2903 net/core/rtnetlink.c
> 
>   2729
>   2730static int rtnl_newlink(struct sk_buff *skb, struct nlmsghdr 
> *nlh,
>   2731struct netlink_ext_ack *extack)
>   2732{
>   2733struct net *net = sock_net(skb->sk);
>   2734const struct rtnl_link_ops *ops;
>   2735const struct rtnl_link_ops *m_ops = NULL;
>   2736struct net_device *dev;
>   2737struct net_device *master_dev = NULL;
>   2738struct ifinfomsg *ifm;
>   2739char kind[MODULE_NAME_LEN];
>   2740char ifname[IFNAMSIZ];
>   2741struct nlattr *tb[IFLA_MAX+1];
>   2742struct nlattr *linkinfo[IFLA_INFO_MAX+1];
>   2743unsigned char name_assign_type = NET_NAME_USER;
>   2744int err;
>   2745
>   2746#ifdef CONFIG_MODULES
>   2747replay:
>   2748#endif
>   2749err = nlmsg_parse(nlh, sizeof(*ifm), tb, IFLA_MAX, 
> ifla_policy, extack);
>   2750if (err < 0)
>   2751return err;
>   2752
>   2753if (tb[IFLA_IF_NETNSID])
>   2754return -EOPNOTSUPP;
>   2755
>   2756if (tb[IFLA_IFNAME])
>   2757nla_strlcpy(ifname, tb[IFLA_IFNAME], IFNAMSIZ);
>   2758else
>   2759ifname[0] = '\0';
>   2760
>   2761ifm = nlmsg_data(nlh);
>   2762if (ifm->ifi_index > 0)
>   2763dev = __dev_get_by_index(net, ifm->ifi_index);
>   2764else {
>   2765if (ifname[0])
>   2766dev = __dev_get_by_name(net, ifname);
>   2767else
>   2768dev = NULL;
>   2769}
>   2770
>   2771if (dev) {
>   2772master_dev = netdev_master_upper_dev_get(dev);
>   2773if (master_dev)
>   2774m_ops = master_dev->rtnl_link_ops;
>   2775}
>   2776
>   2777err = validate_linkmsg(dev, tb);
>   2778if (err < 0)
>   2779return err;
>   2780
>   2781if (tb[IFLA_LINKINFO]) {
>   2782err = nla_parse_nested(linkinfo, IFLA_INFO_MAX,
>   2783   tb[IFLA_LINKINFO], 
> ifla_info_policy,
>   2784   NULL);
>   2785if (err < 0)
>   2786return err;
>   2787} else
>   2788memset(linkinfo, 0, sizeof(linkinfo));
>   2789
>   2790if (linkinfo[IFLA_INFO_KIND]) {
>   2791nla_strlcpy(kind, linkinfo[IFLA_INFO_KIND], 
> sizeof(kind));
>   2792ops = rtnl_link_ops_get(kind);
>   2793} else {
>   2794kind[0] = '\0';
>   2795ops = NULL;
&

RE: [PATCH net] gianfar: prevent integer wrapping in the rx handler

2018-01-30 Thread Claudiu Manoil
>-Original Message-
>From: netdev-ow...@vger.kernel.org [mailto:netdev-ow...@vger.kernel.org]
>On Behalf Of David Miller
>Sent: Monday, January 29, 2018 9:18 PM
>To: aspen...@spacex.com
>Cc: netdev@vger.kernel.org; claudiu.man...@freescale.com
>Subject: Re: [PATCH net] gianfar: prevent integer wrapping in the rx handler
>
>From: Andy Spencer 
>Date: Thu, 25 Jan 2018 19:37:50 -0800
>
>> When the frame check sequence (FCS) is split across the last two frames
>> of a fragmented packet, part of the FCS gets counted twice, once when
>> subtracting the FCS, and again when subtracting the previously received
>> data.
>>
>> For example, if 1602 bytes are received, and the first fragment contains
>> the first 1600 bytes (including the first two bytes of the FCS), and the
>> second fragment contains the last two bytes of the FCS:
>>
>>   'skb->len == 1600' from the first fragment
>>
>>   size  = lstatus & BD_LENGTH_MASK; # 1602
>>   size -= ETH_FCS_LEN;  # 1598
>>   size -= skb->len; # -2
>>
>> Since the size is unsigned, it wraps around and causes a BUG later in
>> the packet handling, as shown below:
>>
>>   kernel BUG at ./include/linux/skbuff.h:2068!
>>   Oops: Exception in kernel mode, sig: 5 [#1]
>>   ...
>>   NIP [c021ec60] skb_pull+0x24/0x44
>>   LR [c01e2fbc] gfar_clean_rx_ring+0x498/0x690
>>   Call Trace:
>>   [df7edeb0] [c01e2c1c] gfar_clean_rx_ring+0xf8/0x690 (unreliable)
>>   [df7edf20] [c01e33a8] gfar_poll_rx_sq+0x3c/0x9c
>>   [df7edf40] [c023352c] net_rx_action+0x21c/0x274
>>   [df7edf90] [c0329000] __do_softirq+0xd8/0x240
>>   [df7edff0] [c000c108] call_do_irq+0x24/0x3c
>>   [c0597e90] [c00041dc] do_IRQ+0x64/0xc4
>>   [c0597eb0] [c000d920] ret_from_except+0x0/0x18
>>   --- interrupt: 501 at arch_cpu_idle+0x24/0x5c
>>
>> Change the size to a signed integer and then trim off any part of the
>> FCS that was received prior to the last fragment.
>>
>> Fixes: 6c389fc931bc ("gianfar: fix size of scatter-gathered frames")
>> Signed-off-by: Andy Spencer 
>
>Applied.

Good catch, thanks.
The fix is not pretty, but I don't see another way around this since this 
hardware
is not able to remove the FCS from the Rx frame.

Thanks,
Claudiu


[PATCH iproute2-next 0/6] ipaddress: Get rid of print_linkinfo_brief()

2018-01-30 Thread Serhey Popovych
With this series I propose to get rid of custom print_linkinfo_brief()
in favor of print_linkinfo() to avoid code duplication.

Changes presented with this series tested using following script:

iproute2_dir="$1"
iface='eth0.2'

pushd "$iproute2_dir" &>/dev/null

for i in new old; do
DIR="/tmp/$i"
mkdir -p "$DIR"

ln -snf ip.$i ip/ip

# normal
ip/ip link show  >"$DIR/ip-link-show"
ip/ip -4 addr show   >"$DIR/ip-4-addr-show"
ip/ip -6 addr show   >"$DIR/ip-6-addr-show"
ip/ip addr show dev "$iface" >"$DIR/ip-addr-show-$iface"

# brief
ip/ip -br link show  >"$DIR/ip-br-link-show"
ip/ip -br -4 addr show   >"$DIR/ip-br-4-addr-show"
ip/ip -br -6 addr show   >"$DIR/ip-br-6-addr-show"
ip/ip -br addr show dev "$iface" >"$DIR/ip-br-addr-show-$iface"
done
rm -f ip/ip

diff -urN /tmp/{old,new}
rc=$?

popd &>/dev/null
exit $rc

Expected results : 
Actual results   : 

Although test coverage is far from ideal in my opinion it covers most
important aspects of the changes presented by the series.

All this work is done in prepare of iplink_get() enhancements to support
attribute parse that finally will be used to simplify ip/tunnel
RTM_GETLINK code.

As always reviews, comments, suggestions and criticism is welcome.

Thanks,
Serhii

Serhey Popovych (6):
  ipaddress: Improve print_linkinfo()
  ipaddress: Simplify print_linkinfo_brief() and it's usage
  lib: Correct object file dependencies
  utils: Introduce and use get_ifname_rta()
  utils: Introduce and use print_name_and_link() to print name@link
  ipaddress: Get rid of print_linkinfo_brief()

 bridge/link.c   |   21 ++
 include/utils.h |5 ++
 ip/ip_common.h  |3 -
 ip/ipaddress.c  |  210 ---
 ip/iplink.c |5 +-
 lib/Makefile|4 +-
 lib/utils.c |   70 +++
 7 files changed, 129 insertions(+), 189 deletions(-)

-- 
1.7.10.4



[PATCH iproute2-next 1/6] ipaddress: Improve print_linkinfo()

2018-01-30 Thread Serhey Popovych
There are few places to improve:

  1) return -1 when entry is filtered instead of zero, which means
 accept entry: ipaddress_list_flush_or_save() the only user of this

  2) use ll_index_to_name() as last resort to translate name to index:
 it will return name in the form "if%d" in case of failure

  3) replace open coded access to IFLA_IFNAME attribute data by
 RTA_DATA() with rta_getattr_str()

  4) simplify ifname printing since name is never NULL, thanks to (2).

Signed-off-by: Serhey Popovych 
---
 ip/ipaddress.c |   30 +-
 1 file changed, 13 insertions(+), 17 deletions(-)

diff --git a/ip/ipaddress.c b/ip/ipaddress.c
index 051a05f..f8fd392 100644
--- a/ip/ipaddress.c
+++ b/ip/ipaddress.c
@@ -948,14 +948,14 @@ int print_linkinfo_brief(const struct sockaddr_nl *who,
parse_rtattr(tb, IFLA_MAX, IFLA_RTA(ifi), len);
if (tb[IFLA_IFNAME] == NULL) {
fprintf(stderr, "BUG: device with ifindex %d has nil ifname\n", 
ifi->ifi_index);
-   name = "";
+   name = ll_index_to_name(ifi->ifi_index);
} else {
name = rta_getattr_str(tb[IFLA_IFNAME]);
}
 
if (pfilter->label &&
(!pfilter->family || pfilter->family == AF_PACKET) &&
-   fnmatch(pfilter->label, RTA_DATA(tb[IFLA_IFNAME]), 0))
+   fnmatch(pfilter->label, name, 0))
return -1;
 
if (tb[IFLA_GROUP]) {
@@ -1057,6 +1057,7 @@ int print_linkinfo(const struct sockaddr_nl *who,
struct ifinfomsg *ifi = NLMSG_DATA(n);
struct rtattr *tb[IFLA_MAX+1];
int len = n->nlmsg_len;
+   const char *name;
unsigned int m_flag = 0;
 
if (n->nlmsg_type != RTM_NEWLINK && n->nlmsg_type != RTM_DELLINK)
@@ -1067,18 +1068,22 @@ int print_linkinfo(const struct sockaddr_nl *who,
return -1;
 
if (filter.ifindex && ifi->ifi_index != filter.ifindex)
-   return 0;
+   return -1;
if (filter.up && !(ifi->ifi_flags&IFF_UP))
-   return 0;
+   return -1;
 
parse_rtattr(tb, IFLA_MAX, IFLA_RTA(ifi), len);
-   if (tb[IFLA_IFNAME] == NULL)
+   if (tb[IFLA_IFNAME] == NULL) {
fprintf(stderr, "BUG: device with ifindex %d has nil ifname\n", 
ifi->ifi_index);
+   name = ll_index_to_name(ifi->ifi_index);
+   } else {
+   name = rta_getattr_str(tb[IFLA_IFNAME]);
+   }
 
if (filter.label &&
(!filter.family || filter.family == AF_PACKET) &&
-   fnmatch(filter.label, RTA_DATA(tb[IFLA_IFNAME]), 0))
-   return 0;
+   fnmatch(filter.label, name, 0))
+   return -1;
 
if (tb[IFLA_GROUP]) {
int group = rta_getattr_u32(tb[IFLA_GROUP]);
@@ -1105,16 +1110,7 @@ int print_linkinfo(const struct sockaddr_nl *who,
print_bool(PRINT_ANY, "deleted", "Deleted ", true);
 
print_int(PRINT_ANY, "ifindex", "%d: ", ifi->ifi_index);
-   if (tb[IFLA_IFNAME]) {
-   print_color_string(PRINT_ANY,
-  COLOR_IFNAME,
-  "ifname", "%s",
-  rta_getattr_str(tb[IFLA_IFNAME]));
-   } else {
-   print_null(PRINT_JSON, "ifname", NULL, NULL);
-   print_color_null(PRINT_FP, COLOR_IFNAME,
-"ifname", "%s", "");
-   }
+   print_color_string(PRINT_ANY, COLOR_IFNAME, "ifname", "%s", name);
 
if (tb[IFLA_LINK]) {
int iflink = rta_getattr_u32(tb[IFLA_LINK]);
-- 
1.7.10.4



[PATCH iproute2-next 5/6] utils: Introduce and use print_name_and_link() to print name@link

2018-01-30 Thread Serhey Popovych
There is at least three places implementing same things: two in
ipaddress.c print_linkinfo() & print_linkinfo_brief() and one in
bridge/link.c.

These two implementations diverge from each other very little:
bridge/link.c does not support JSON output at the moment and
print_linkinfo_brief() does not handle IFLA_LINK_NETNS case.

Introduce and use print_name_and_link() routine to handle name@link
output in all possible variations; respect IFLA_LINK_NETNS attribute to
handle case when link is in different namespace; use "if%d" template
for interface name instead of "" to share logic with other
code (e.g. ll_name_to_index() and ll_index_to_name()) supporting such
template.

Signed-off-by: Serhey Popovych 
---
 bridge/link.c   |   13 +++--
 include/utils.h |4 
 ip/ipaddress.c  |   48 ++--
 lib/utils.c |   51 +++
 4 files changed, 60 insertions(+), 56 deletions(-)

diff --git a/bridge/link.c b/bridge/link.c
index a11cbb1..90c9734 100644
--- a/bridge/link.c
+++ b/bridge/link.c
@@ -125,20 +125,13 @@ int print_linkinfo(const struct sockaddr_nl *who,
if (n->nlmsg_type == RTM_DELLINK)
fprintf(fp, "Deleted ");
 
-   fprintf(fp, "%d: %s ", ifi->ifi_index,
-   tb[IFLA_IFNAME] ? rta_getattr_str(tb[IFLA_IFNAME]) : "");
+   fprintf(fp, "%d: ", ifi->ifi_index);
+
+   print_name_and_link("%s: ", COLOR_NONE, name, tb);
 
if (tb[IFLA_OPERSTATE])
print_operstate(fp, rta_getattr_u8(tb[IFLA_OPERSTATE]));
 
-   if (tb[IFLA_LINK]) {
-   int iflink = rta_getattr_u32(tb[IFLA_LINK]);
-
-   fprintf(fp, "@%s: ",
-   iflink ? ll_index_to_name(iflink) : "NONE");
-   } else
-   fprintf(fp, ": ");
-
print_link_flags(fp, ifi->ifi_flags);
 
if (tb[IFLA_MTU])
diff --git a/include/utils.h b/include/utils.h
index 5738c97..d217073 100644
--- a/include/utils.h
+++ b/include/utils.h
@@ -12,6 +12,7 @@
 #include "libnetlink.h"
 #include "ll_map.h"
 #include "rtm_map.h"
+#include "json_print.h"
 
 extern int preferred_family;
 extern int human_readable;
@@ -240,6 +241,9 @@ void print_escape_buf(const __u8 *buf, size_t len, const 
char *escape);
 int print_timestamp(FILE *fp);
 void print_nlmsg_timestamp(FILE *fp, const struct nlmsghdr *n);
 
+unsigned int print_name_and_link(const char *fmt, enum color_attr color,
+const char *name, struct rtattr *tb[]);
+
 #define BIT(nr) (1UL << (nr))
 
 #define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
diff --git a/ip/ipaddress.c b/ip/ipaddress.c
index 1797927..5afc9d4 100644
--- a/ip/ipaddress.c
+++ b/ip/ipaddress.c
@@ -926,7 +926,6 @@ int print_linkinfo_brief(const struct sockaddr_nl *who,
struct rtattr *tb[IFLA_MAX+1];
int len = n->nlmsg_len;
const char *name;
-   char buf[32] = { 0, };
unsigned int m_flag = 0;
 
if (n->nlmsg_type != RTM_NEWLINK && n->nlmsg_type != RTM_DELLINK)
@@ -976,26 +975,7 @@ int print_linkinfo_brief(const struct sockaddr_nl *who,
if (n->nlmsg_type == RTM_DELLINK)
print_bool(PRINT_ANY, "deleted", "Deleted ", true);
 
-   if (tb[IFLA_LINK]) {
-   SPRINT_BUF(b1);
-   int iflink = rta_getattr_u32(tb[IFLA_LINK]);
-
-   if (iflink == 0) {
-   snprintf(buf, sizeof(buf), "%s@NONE", name);
-   print_null(PRINT_JSON, "link", NULL, NULL);
-   } else {
-   const char *link = ll_idx_n2a(iflink, b1);
-
-   print_string(PRINT_JSON, "link", NULL, link);
-   snprintf(buf, sizeof(buf), "%s@%s", name, link);
-   m_flag = ll_index_to_flags(iflink);
-   m_flag = !(m_flag & IFF_UP);
-   }
-   } else
-   snprintf(buf, sizeof(buf), "%s", name);
-
-   print_string(PRINT_FP, NULL, "%-16s ", buf);
-   print_string(PRINT_JSON, "ifname", NULL, name);
+   m_flag = print_name_and_link("%-16s ", COLOR_NONE, name, tb);
 
if (tb[IFLA_OPERSTATE])
print_operstate(fp, rta_getattr_u8(tb[IFLA_OPERSTATE]));
@@ -1102,31 +1082,7 @@ int print_linkinfo(const struct sockaddr_nl *who,
print_bool(PRINT_ANY, "deleted", "Deleted ", true);
 
print_int(PRINT_ANY, "ifindex", "%d: ", ifi->ifi_index);
-   print_color_string(PRINT_ANY, COLOR_IFNAME, "ifname", "%s", name);
-
-   if (tb[IFLA_LINK]) {
-   int iflink = rta_getattr_u32(tb[IFLA_LINK]);
-
-   if (iflink == 0)
-   print_null(PRINT_ANY, "link", "@%s: ", "NONE");
-   else {
-   if (tb[IFLA_LINK_NETNSID])
-   print_int(PRINT_ANY,
- "link_index", "@if%d: ", iflink);
-  

[PATCH iproute2-next 3/6] lib: Correct object file dependencies

2018-01-30 Thread Serhey Popovych
Neither internal libnetlink nor libgenl depends on ll_map.o: prepare for
upcoming changes that brings much more cleaner dependency between
utils.o and ll_map.o.

Signed-off-by: Serhey Popovych 
---
 lib/Makefile |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/lib/Makefile b/lib/Makefile
index 7b34ed5..bab8cbf 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -3,11 +3,11 @@ include ../config.mk
 
 CFLAGS += -fPIC
 
-UTILOBJ = utils.o rt_names.o ll_types.o ll_proto.o ll_addr.o \
+UTILOBJ = utils.o rt_names.o ll_map.o ll_types.o ll_proto.o ll_addr.o \
inet_proto.o namespace.o json_writer.o json_print.o \
names.o color.o bpf.o exec.o fs.o
 
-NLOBJ=libgenl.o ll_map.o libnetlink.o
+NLOBJ=libgenl.o libnetlink.o
 
 all: libnetlink.a libutil.a
 
-- 
1.7.10.4



[PATCH iproute2-next 6/6] ipaddress: Get rid of print_linkinfo_brief()

2018-01-30 Thread Serhey Popovych
It's functionality can be fully accomplished by print_linkinfo(): no
need to duplicate code.

Signed-off-by: Serhey Popovych 
---
 ip/ip_common.h |2 -
 ip/ipaddress.c |  126 ++--
 ip/iplink.c|5 +--
 3 files changed, 33 insertions(+), 100 deletions(-)

diff --git a/ip/ip_common.h b/ip/ip_common.h
index f5adbad..a7bbf1d 100644
--- a/ip/ip_common.h
+++ b/ip/ip_common.h
@@ -29,8 +29,6 @@ struct link_filter {
 int get_operstate(const char *name);
 int print_linkinfo(const struct sockaddr_nl *who,
   struct nlmsghdr *n, void *arg);
-int print_linkinfo_brief(const struct sockaddr_nl *who,
-struct nlmsghdr *n, void *arg);
 int print_addrinfo(const struct sockaddr_nl *who,
   struct nlmsghdr *n, void *arg);
 int print_addrlabel(const struct sockaddr_nl *who,
diff --git a/ip/ipaddress.c b/ip/ipaddress.c
index 5afc9d4..88e61e4 100644
--- a/ip/ipaddress.c
+++ b/ip/ipaddress.c
@@ -918,90 +918,6 @@ static void print_link_stats(FILE *fp, struct nlmsghdr *n)
fprintf(fp, "%s", _SL_);
 }
 
-int print_linkinfo_brief(const struct sockaddr_nl *who,
-struct nlmsghdr *n, void *arg)
-{
-   FILE *fp = (FILE *)arg;
-   struct ifinfomsg *ifi = NLMSG_DATA(n);
-   struct rtattr *tb[IFLA_MAX+1];
-   int len = n->nlmsg_len;
-   const char *name;
-   unsigned int m_flag = 0;
-
-   if (n->nlmsg_type != RTM_NEWLINK && n->nlmsg_type != RTM_DELLINK)
-   return -1;
-
-   len -= NLMSG_LENGTH(sizeof(*ifi));
-   if (len < 0)
-   return -1;
-
-   if (filter.ifindex && ifi->ifi_index != filter.ifindex)
-   return -1;
-   if (filter.up && !(ifi->ifi_flags&IFF_UP))
-   return -1;
-
-   parse_rtattr(tb, IFLA_MAX, IFLA_RTA(ifi), len);
-
-   name = get_ifname_rta(ifi->ifi_index, tb[IFLA_IFNAME]);
-   if (!name)
-   return -1;
-
-   if (filter.label &&
-   (!filter.family || filter.family == AF_PACKET) &&
-   fnmatch(filter.label, name, 0))
-   return -1;
-
-   if (tb[IFLA_GROUP]) {
-   int group = rta_getattr_u32(tb[IFLA_GROUP]);
-
-   if (filter.group != -1 && group != filter.group)
-   return -1;
-   }
-
-   if (tb[IFLA_MASTER]) {
-   int master = rta_getattr_u32(tb[IFLA_MASTER]);
-
-   if (filter.master > 0 && master != filter.master)
-   return -1;
-   } else if (filter.master > 0)
-   return -1;
-
-   if (filter.kind && match_link_kind(tb, filter.kind, 0))
-   return -1;
-
-   if (filter.slave_kind && match_link_kind(tb, filter.slave_kind, 1))
-   return -1;
-
-   if (n->nlmsg_type == RTM_DELLINK)
-   print_bool(PRINT_ANY, "deleted", "Deleted ", true);
-
-   m_flag = print_name_and_link("%-16s ", COLOR_NONE, name, tb);
-
-   if (tb[IFLA_OPERSTATE])
-   print_operstate(fp, rta_getattr_u8(tb[IFLA_OPERSTATE]));
-
-   if (filter.family == AF_PACKET) {
-   SPRINT_BUF(b1);
-
-   if (tb[IFLA_ADDRESS]) {
-   print_color_string(PRINT_ANY, COLOR_MAC,
-  "address", "%s ",
-  ll_addr_n2a(
-  RTA_DATA(tb[IFLA_ADDRESS]),
-  
RTA_PAYLOAD(tb[IFLA_ADDRESS]),
-  ifi->ifi_type,
-  b1, sizeof(b1)));
-   }
-   }
-
-   if (filter.family == AF_PACKET) {
-   print_link_flags(fp, ifi->ifi_flags, m_flag);
-   print_string(PRINT_FP, NULL, "%s", "\n");
-   }
-   fflush(fp);
-   return 0;
-}
-
 static const char *link_events[] = {
[IFLA_EVENT_NONE] = "NONE",
[IFLA_EVENT_REBOOT] = "REBOOT",
@@ -1033,6 +949,7 @@ int print_linkinfo(const struct sockaddr_nl *who,
int len = n->nlmsg_len;
const char *name;
unsigned int m_flag = 0;
+   SPRINT_BUF(b1);
 
if (n->nlmsg_type != RTM_NEWLINK && n->nlmsg_type != RTM_DELLINK)
return 0;
@@ -1081,6 +998,34 @@ int print_linkinfo(const struct sockaddr_nl *who,
if (n->nlmsg_type == RTM_DELLINK)
print_bool(PRINT_ANY, "deleted", "Deleted ", true);
 
+   if (brief) {
+   print_name_and_link("%-16s ", COLOR_NONE, name, tb);
+
+   if (tb[IFLA_OPERSTATE])
+   print_operstate(fp, rta_getattr_u8(tb[IFLA_OPERSTATE]));
+
+   if (filter.family == AF_PACKET) {
+   if (tb[IFLA_ADDRESS]) {
+   struct rtattr *rta = tb[IFLA_ADDRESS];
+
+   print_color_string(PRINT_ANY,
+

[PATCH iproute2-next 4/6] utils: Introduce and use get_ifname_rta()

2018-01-30 Thread Serhey Popovych
Be consistent in handling of IFLA_IFNAME attribute in all places: if
there is no attribute report bug to stderr and use ll_index_to_name() as
last measure to get name in "if%d" format instead of "".

Use check_ifname() to validate network device name: this catches both
unexpected return from kernel and ll_index_to_name().

Signed-off-by: Serhey Popovych 
---
 bridge/link.c   |8 
 include/utils.h |1 +
 ip/ipaddress.c  |   20 
 lib/utils.c |   19 +++
 4 files changed, 32 insertions(+), 16 deletions(-)

diff --git a/bridge/link.c b/bridge/link.c
index 870ebe0..a11cbb1 100644
--- a/bridge/link.c
+++ b/bridge/link.c
@@ -99,9 +99,10 @@ int print_linkinfo(const struct sockaddr_nl *who,
   struct nlmsghdr *n, void *arg)
 {
FILE *fp = arg;
-   int len = n->nlmsg_len;
struct ifinfomsg *ifi = NLMSG_DATA(n);
struct rtattr *tb[IFLA_MAX+1];
+   int len = n->nlmsg_len;
+   const char *name;
 
len -= NLMSG_LENGTH(sizeof(*ifi));
if (len < 0) {
@@ -117,10 +118,9 @@ int print_linkinfo(const struct sockaddr_nl *who,
 
parse_rtattr_flags(tb, IFLA_MAX, IFLA_RTA(ifi), len, NLA_F_NESTED);
 
-   if (tb[IFLA_IFNAME] == NULL) {
-   fprintf(stderr, "BUG: nil ifname\n");
+   name = get_ifname_rta(ifi->ifi_index, tb[IFLA_IFNAME]);
+   if (!name)
return -1;
-   }
 
if (n->nlmsg_type == RTM_DELLINK)
fprintf(fp, "Deleted ");
diff --git a/include/utils.h b/include/utils.h
index 0394268..5738c97 100644
--- a/include/utils.h
+++ b/include/utils.h
@@ -173,6 +173,7 @@ void duparg(const char *, const char *) 
__attribute__((noreturn));
 void duparg2(const char *, const char *) __attribute__((noreturn));
 int check_ifname(const char *);
 int get_ifname(char *, const char *);
+const char *get_ifname_rta(int ifindex, const struct rtattr *rta);
 int matches(const char *arg, const char *pattern);
 int inet_addr_match(const inet_prefix *a, const inet_prefix *b, int bits);
 int inet_addr_match_rta(const inet_prefix *m, const struct rtattr *rta);
diff --git a/ip/ipaddress.c b/ip/ipaddress.c
index c15abd1..1797927 100644
--- a/ip/ipaddress.c
+++ b/ip/ipaddress.c
@@ -942,12 +942,10 @@ int print_linkinfo_brief(const struct sockaddr_nl *who,
return -1;
 
parse_rtattr(tb, IFLA_MAX, IFLA_RTA(ifi), len);
-   if (tb[IFLA_IFNAME] == NULL) {
-   fprintf(stderr, "BUG: device with ifindex %d has nil ifname\n", 
ifi->ifi_index);
-   name = ll_index_to_name(ifi->ifi_index);
-   } else {
-   name = rta_getattr_str(tb[IFLA_IFNAME]);
-   }
+
+   name = get_ifname_rta(ifi->ifi_index, tb[IFLA_IFNAME]);
+   if (!name)
+   return -1;
 
if (filter.label &&
(!filter.family || filter.family == AF_PACKET) &&
@@ -1069,12 +1067,10 @@ int print_linkinfo(const struct sockaddr_nl *who,
return -1;
 
parse_rtattr(tb, IFLA_MAX, IFLA_RTA(ifi), len);
-   if (tb[IFLA_IFNAME] == NULL) {
-   fprintf(stderr, "BUG: device with ifindex %d has nil ifname\n", 
ifi->ifi_index);
-   name = ll_index_to_name(ifi->ifi_index);
-   } else {
-   name = rta_getattr_str(tb[IFLA_IFNAME]);
-   }
+
+   name = get_ifname_rta(ifi->ifi_index, tb[IFLA_IFNAME]);
+   if (!name)
+   return -1;
 
if (filter.label &&
(!filter.family || filter.family == AF_PACKET) &&
diff --git a/lib/utils.c b/lib/utils.c
index 8e15625..29e2d84 100644
--- a/lib/utils.c
+++ b/lib/utils.c
@@ -871,6 +871,25 @@ int get_ifname(char *buf, const char *name)
return ret;
 }
 
+const char *get_ifname_rta(int ifindex, const struct rtattr *rta)
+{
+   const char *name;
+
+   if (rta) {
+   name = rta_getattr_str(rta);
+   } else {
+   fprintf(stderr,
+   "BUG: device with ifindex %d has nil ifname\n",
+   ifindex);
+   name = ll_index_to_name(ifindex);
+   }
+
+   if (check_ifname(name))
+   return NULL;
+
+   return name;
+}
+
 int matches(const char *cmd, const char *pattern)
 {
int len = strlen(cmd);
-- 
1.7.10.4



[PATCH iproute2-next 2/6] ipaddress: Simplify print_linkinfo_brief() and it's usage

2018-01-30 Thread Serhey Popovych
Improve print_linkinfo_brief() and it's callers:

  1) Get rid of custom @struct filter pointer @pfilter: it is NULL in
 all callers anyway and global @filter is used.

  2) Simplify calling code in ipaddr_list_flush_or_save() by
 introducing intermediate variable of @struct nlmsghdr, drop
 duplicated code: print_linkinfo_brief() never returns values other
 than <= 0 so we can move print_selected_addrinfo() outside of each
 block.

Signed-off-by: Serhey Popovych 
---
 ip/ip_common.h |3 +--
 ip/ipaddress.c |   60 
 ip/iplink.c|2 +-
 3 files changed, 28 insertions(+), 37 deletions(-)

diff --git a/ip/ip_common.h b/ip/ip_common.h
index 3203f0c..f5adbad 100644
--- a/ip/ip_common.h
+++ b/ip/ip_common.h
@@ -30,8 +30,7 @@ int get_operstate(const char *name);
 int print_linkinfo(const struct sockaddr_nl *who,
   struct nlmsghdr *n, void *arg);
 int print_linkinfo_brief(const struct sockaddr_nl *who,
-struct nlmsghdr *n, void *arg,
-struct link_filter *filter);
+struct nlmsghdr *n, void *arg);
 int print_addrinfo(const struct sockaddr_nl *who,
   struct nlmsghdr *n, void *arg);
 int print_addrlabel(const struct sockaddr_nl *who,
diff --git a/ip/ipaddress.c b/ip/ipaddress.c
index f8fd392..c15abd1 100644
--- a/ip/ipaddress.c
+++ b/ip/ipaddress.c
@@ -919,8 +919,7 @@ static void print_link_stats(FILE *fp, struct nlmsghdr *n)
 }
 
 int print_linkinfo_brief(const struct sockaddr_nl *who,
-struct nlmsghdr *n, void *arg,
-struct link_filter *pfilter)
+struct nlmsghdr *n, void *arg)
 {
FILE *fp = (FILE *)arg;
struct ifinfomsg *ifi = NLMSG_DATA(n);
@@ -937,12 +936,9 @@ int print_linkinfo_brief(const struct sockaddr_nl *who,
if (len < 0)
return -1;
 
-   if (!pfilter)
-   pfilter = &filter;
-
-   if (pfilter->ifindex && ifi->ifi_index != pfilter->ifindex)
+   if (filter.ifindex && ifi->ifi_index != filter.ifindex)
return -1;
-   if (pfilter->up && !(ifi->ifi_flags&IFF_UP))
+   if (filter.up && !(ifi->ifi_flags&IFF_UP))
return -1;
 
parse_rtattr(tb, IFLA_MAX, IFLA_RTA(ifi), len);
@@ -953,30 +949,30 @@ int print_linkinfo_brief(const struct sockaddr_nl *who,
name = rta_getattr_str(tb[IFLA_IFNAME]);
}
 
-   if (pfilter->label &&
-   (!pfilter->family || pfilter->family == AF_PACKET) &&
-   fnmatch(pfilter->label, name, 0))
+   if (filter.label &&
+   (!filter.family || filter.family == AF_PACKET) &&
+   fnmatch(filter.label, name, 0))
return -1;
 
if (tb[IFLA_GROUP]) {
int group = rta_getattr_u32(tb[IFLA_GROUP]);
 
-   if (pfilter->group != -1 && group != pfilter->group)
+   if (filter.group != -1 && group != filter.group)
return -1;
}
 
if (tb[IFLA_MASTER]) {
int master = rta_getattr_u32(tb[IFLA_MASTER]);
 
-   if (pfilter->master > 0 && master != pfilter->master)
+   if (filter.master > 0 && master != filter.master)
return -1;
-   } else if (pfilter->master > 0)
+   } else if (filter.master > 0)
return -1;
 
-   if (pfilter->kind && match_link_kind(tb, pfilter->kind, 0))
+   if (filter.kind && match_link_kind(tb, filter.kind, 0))
return -1;
 
-   if (pfilter->slave_kind && match_link_kind(tb, pfilter->slave_kind, 1))
+   if (filter.slave_kind && match_link_kind(tb, filter.slave_kind, 1))
return -1;
 
if (n->nlmsg_type == RTM_DELLINK)
@@ -1006,7 +1002,7 @@ int print_linkinfo_brief(const struct sockaddr_nl *who,
if (tb[IFLA_OPERSTATE])
print_operstate(fp, rta_getattr_u8(tb[IFLA_OPERSTATE]));
 
-   if (pfilter->family == AF_PACKET) {
+   if (filter.family == AF_PACKET) {
SPRINT_BUF(b1);
 
if (tb[IFLA_ADDRESS]) {
@@ -1020,7 +1016,7 @@ int print_linkinfo_brief(const struct sockaddr_nl *who,
}
}
 
-   if (pfilter->family == AF_PACKET) {
+   if (filter.family == AF_PACKET) {
print_link_flags(fp, ifi->ifi_flags, m_flag);
print_string(PRINT_FP, NULL, "%s", "\n");
}
@@ -2188,25 +2184,21 @@ static int ipaddr_list_flush_or_save(int argc, char 
**argv, int action)
ipaddr_filter(&linfo, ainfo);
 
for (l = linfo.head; l; l = l->next) {
-   int res = 0;
-   struct ifinfomsg *ifi = NLMSG_DATA(&l->h);
+   struct nlmsghdr *n = &l->h;
+   struct ifinfomsg *ifi = NLMSG_DATA(n);
+   int res;
 
open_json_object(NULL);
-   if (brief)

Re: [PATCH v3 1/2] net: create skb_gso_validate_mac_len()

2018-01-30 Thread kbuild test robot
Hi Daniel,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on net-next/master]
[also build test ERROR on v4.15 next-20180126]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Daniel-Axtens/bnx2x-disable-GSO-on-too-large-packets/20180131-000934
config: i386-randconfig-a1-201804 (attached as .config)
compiler: gcc-4.9 (Debian 4.9.4-2) 4.9.4
reproduce:
# save the attached .config to linux build tree
make ARCH=i386 

All errors (new ones prefixed by >>):

   In file included from include/linux/linkage.h:7:0,
from include/linux/kernel.h:7,
from include/linux/list.h:9,
from include/linux/module.h:9,
from net/core/skbuff.c:41:
>> net/core/skbuff.c:4968:19: error: 'skb_gso_validate_network_len' undeclared 
>> here (not in a function)
EXPORT_SYMBOL_GPL(skb_gso_validate_network_len);
  ^
   include/linux/export.h:65:16: note: in definition of macro '___EXPORT_SYMBOL'
 extern typeof(sym) sym;  \
   ^
   net/core/skbuff.c:4968:1: note: in expansion of macro 'EXPORT_SYMBOL_GPL'
EXPORT_SYMBOL_GPL(skb_gso_validate_network_len);
^

vim +/skb_gso_validate_network_len +4968 net/core/skbuff.c

  4954  
  4955  /**
  4956   * skb_gso_validate_mtu - Return in case such skb fits a given MTU
  4957   *
  4958   * @skb: GSO skb
  4959   * @mtu: MTU to validate against
  4960   *
  4961   * skb_gso_validate_mtu validates if a given skb will fit a wanted MTU
  4962   * once split.
  4963   */
  4964  bool skb_gso_validate_mtu(const struct sk_buff *skb, unsigned int mtu)
  4965  {
  4966  return skb_gso_size_check(skb, skb_gso_network_seglen(skb), 
mtu);
  4967  }
> 4968  EXPORT_SYMBOL_GPL(skb_gso_validate_network_len);
  4969  

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


Re: [PATCH net,stable] qmi_wwan: Add support for Quectel EP06

2018-01-30 Thread Bjørn Mork
Kristian Evensen  writes:

> The Quectel EP06 is a Cat. 6 LTE modem. It uses the same interface as
> the EC20/EC25 for QMI, and requires the same "set DTR"-quirk to work.
>
> Signed-off-by: Kristian Evensen 

Acked-by: Bjørn Mork 


Re: [PATCH v3 1/2] net: create skb_gso_validate_mac_len()

2018-01-30 Thread kbuild test robot
Hi Daniel,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on net-next/master]
[also build test ERROR on v4.15 next-20180126]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Daniel-Axtens/bnx2x-disable-GSO-on-too-large-packets/20180131-000934
config: ia64-allmodconfig (attached as .config)
compiler: ia64-linux-gcc (GCC) 7.2.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=ia64 

All errors (new ones prefixed by >>):

   In file included from include/linux/linkage.h:7:0,
from include/linux/kernel.h:7,
from include/linux/list.h:9,
from include/linux/module.h:9,
from net//core/skbuff.c:41:
>> net//core/skbuff.c:4968:19: error: 'skb_gso_validate_network_len' undeclared 
>> here (not in a function); did you mean 'skb_gso_validate_mac_len'?
EXPORT_SYMBOL_GPL(skb_gso_validate_network_len);
  ^
   include/linux/export.h:65:16: note: in definition of macro '___EXPORT_SYMBOL'
 extern typeof(sym) sym;  \
   ^~~
   net//core/skbuff.c:4968:1: note: in expansion of macro 'EXPORT_SYMBOL_GPL'
EXPORT_SYMBOL_GPL(skb_gso_validate_network_len);
^

vim +4968 net//core/skbuff.c

  4954  
  4955  /**
  4956   * skb_gso_validate_mtu - Return in case such skb fits a given MTU
  4957   *
  4958   * @skb: GSO skb
  4959   * @mtu: MTU to validate against
  4960   *
  4961   * skb_gso_validate_mtu validates if a given skb will fit a wanted MTU
  4962   * once split.
  4963   */
  4964  bool skb_gso_validate_mtu(const struct sk_buff *skb, unsigned int mtu)
  4965  {
  4966  return skb_gso_size_check(skb, skb_gso_network_seglen(skb), 
mtu);
  4967  }
> 4968  EXPORT_SYMBOL_GPL(skb_gso_validate_network_len);
  4969  

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


Re: [RFC crypto v3 8/9] chtls: Register the ULP

2018-01-30 Thread Dave Watson
On 01/30/18 06:51 AM, Atul Gupta wrote:

> What I was referring is that passing "tls" ulp type in setsockopt
> may be insufficient to make the decision when multi HW assist Inline
> TLS solution exists.

Setting the ULP doesn't choose HW or SW implementation, I think that
should be done later when setting up crypto with 

setsockopt(SOL_TLS, TLS_TX, struct crypto_info).

Any reason we can't use ethtool to choose HW vs SW implementation, if
available on the device?

> Some HW may go beyond defining sendmsg/sendpage of the prot and
> require additional info to setup the env? Also, we need to keep
> vendor specific code out of tls_main.c i.e anything other than
> base/sw_tx prot perhaps go to hw driver.

Sure, but I think we can add hooks to tls_main to do this without a
new ULP.


Re: [PATCH iproute2-next 6/6] ipaddress: Get rid of print_linkinfo_brief()

2018-01-30 Thread Stephen Hemminger
On Tue, 30 Jan 2018 18:52:48 +0200
Serhey Popovych  wrote:

> + if (brief) {
> + print_name_and_link("%-16s ", COLOR_NONE, name, tb);
> +
> + if (tb[IFLA_OPERSTATE])
> + print_operstate(fp, rta_getattr_u8(tb[IFLA_OPERSTATE]));
> +
> + if (filter.family == AF_PACKET) {
> + if (tb[IFLA_ADDRESS]) {
> + struct rtattr *rta = tb[IFLA_ADDRESS];
> +
> + print_color_string(PRINT_ANY,
> +COLOR_MAC,
> +"address",
> +"%s ",
> +ll_addr_n2a(RTA_DATA(rta),
> +RTA_PAYLOAD(rta),
> +ifi->ifi_type,
> +b1, sizeof(b1)));
> + }
> +
> + print_link_flags(fp, ifi->ifi_flags, m_flag);
> + print_string(PRINT_FP, NULL, "%s", "\n");
> + }
> +
> + fflush(fp);
> + return 0;
> + }

To keep function shorter and therefore more readable, why not:

if (brief)
return print_linkinfo_brief(fp, ifi, tb);

And put this if branch in new version of print_linkinfo_brief.


[PATCH net] r8169: fix RTL8168EP take too long to complete driver initialization.

2018-01-30 Thread Chunhao Lin
Driver check the wrong register bit in rtl_ocp_tx_cond() that keep driver
waiting until timeout.

Fix this by waiting for the right register bit.

Signed-off-by: Chunhao Lin 
---
 drivers/net/ethernet/realtek/r8169.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/realtek/r8169.c 
b/drivers/net/ethernet/realtek/r8169.c
index 734286e..dd713df 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -1395,7 +1395,7 @@ DECLARE_RTL_COND(rtl_ocp_tx_cond)
 {
void __iomem *ioaddr = tp->mmio_addr;
 
-   return RTL_R8(IBISR0) & 0x02;
+   return RTL_R8(IBISR0) & 0x20;
 }
 
 static void rtl8168ep_stop_cmac(struct rtl8169_private *tp)
@@ -1403,7 +1403,7 @@ static void rtl8168ep_stop_cmac(struct rtl8169_private 
*tp)
void __iomem *ioaddr = tp->mmio_addr;
 
RTL_W8(IBCR2, RTL_R8(IBCR2) & ~0x01);
-   rtl_msleep_loop_wait_low(tp, &rtl_ocp_tx_cond, 50, 2000);
+   rtl_msleep_loop_wait_high(tp, &rtl_ocp_tx_cond, 50, 2000);
RTL_W8(IBISR0, RTL_R8(IBISR0) | 0x20);
RTL_W8(IBCR0, RTL_R8(IBCR0) & ~0x01);
 }
-- 
2.7.4



Re: iproute2 4.14.1 tc class add come to kernel-panic

2018-01-30 Thread Roland Franke

Hello,

On Mon, Jan 29, 2018 at 9:03 AM, Cong Wang  
wrote:

On Mon, Jan 29, 2018 at 8:00 AM, Stephen Hemminger
 wrote:

On Mon, 29 Jan 2018 16:18:07 +0100
"Roland Franke"  wrote:


Hello,

> To: Roland Franke ; netdev@vger.kernel.org
> Subject: Re: BUG: iproute2 4.14.1 tc class add come to kernel-panic
>>
>> tc qdisc add dev eth0 root handle 20: htb default 4 r2q 1
>> tc class add dev eth0 parent 20: classid 20:7 htb rate 1kbit
>> tc qdisc add dev eth0 parent 20:7 sfq perturb 10
>> tc class add dev eth0 parent 20:7 classid 20:1 htb rate 200kbit ceil
>> 1kbit prio 0
>>
>> I become an Kernel-panic with the following output:
>> kern.err kernel: BUG: scheduling while atomic: tc/1036/0x0200
>

> Would you have a stack trace to share with us ?

As i will be an absolute newby here, i will not know how to
get the stack trace out.
When i will get some information how to get this, i can try to
give you this information.
But by my last tests i made the first 3 commands on an console
and had no error. Only by typing the last line i will get the error and
here i get actally only the "kern.err kernel: BUG: ." message.

Roland


It generates this with lockdep (on 4.15)

[  151.355076] HTB: quantum of class 27 is big. Consider r2q change.



We hold qdisc tree spinlock but call rcu_barrier() in
mini_qdisc_pair_swap()...



Well, not min_qdisc things, but it should be resolved by:



commit efbf78973978b0d25af59bc26c8013a942af6e64
Author: Cong Wang 
Date:   Mon Dec 4 10:48:18 2017 -0800



net_sched: get rid of rcu_barrier() in tcf_block_put_ext()


Against what kernel-version was this be made?
patch from https://patchwork.ozlabs.org/patch/844372/
will not fit against kernel 4.14.15



[PATCH net] netfilter: on sockopt() acquire sock lock only in the required scope

2018-01-30 Thread Paolo Abeni
Syzbot reported several deadlocks in the netfilter area caused by
rtnl lock and socket lock being acquired with a different order on
different code paths, leading to backtraces like the following one:

==
WARNING: possible circular locking dependency detected
4.15.0-rc9+ #212 Not tainted
--
syzkaller041579/3682 is trying to acquire lock:
  (sk_lock-AF_INET6){+.+.}, at: [<8775e4dd>] lock_sock
include/net/sock.h:1463 [inline]
  (sk_lock-AF_INET6){+.+.}, at: [<8775e4dd>]
do_ipv6_setsockopt.isra.8+0x3c5/0x39d0 net/ipv6/ipv6_sockglue.c:167

but task is already holding lock:
  (rtnl_mutex){+.+.}, at: [<4342eaa9>] rtnl_lock+0x17/0x20
net/core/rtnetlink.c:74

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 (rtnl_mutex){+.+.}:
__mutex_lock_common kernel/locking/mutex.c:756 [inline]
__mutex_lock+0x16f/0x1a80 kernel/locking/mutex.c:893
mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:908
rtnl_lock+0x17/0x20 net/core/rtnetlink.c:74
register_netdevice_notifier+0xad/0x860 net/core/dev.c:1607
tee_tg_check+0x1a0/0x280 net/netfilter/xt_TEE.c:106
xt_check_target+0x22c/0x7d0 net/netfilter/x_tables.c:845
check_target net/ipv6/netfilter/ip6_tables.c:538 [inline]
find_check_entry.isra.7+0x935/0xcf0
net/ipv6/netfilter/ip6_tables.c:580
translate_table+0xf52/0x1690 net/ipv6/netfilter/ip6_tables.c:749
do_replace net/ipv6/netfilter/ip6_tables.c:1165 [inline]
do_ip6t_set_ctl+0x370/0x5f0 net/ipv6/netfilter/ip6_tables.c:1691
nf_sockopt net/netfilter/nf_sockopt.c:106 [inline]
nf_setsockopt+0x67/0xc0 net/netfilter/nf_sockopt.c:115
ipv6_setsockopt+0x115/0x150 net/ipv6/ipv6_sockglue.c:928
udpv6_setsockopt+0x45/0x80 net/ipv6/udp.c:1422
sock_common_setsockopt+0x95/0xd0 net/core/sock.c:2978
SYSC_setsockopt net/socket.c:1849 [inline]
SyS_setsockopt+0x189/0x360 net/socket.c:1828
entry_SYSCALL_64_fastpath+0x29/0xa0

-> #0 (sk_lock-AF_INET6){+.+.}:
lock_acquire+0x1d5/0x580 kernel/locking/lockdep.c:3914
lock_sock_nested+0xc2/0x110 net/core/sock.c:2780
lock_sock include/net/sock.h:1463 [inline]
do_ipv6_setsockopt.isra.8+0x3c5/0x39d0 net/ipv6/ipv6_sockglue.c:167
ipv6_setsockopt+0xd7/0x150 net/ipv6/ipv6_sockglue.c:922
udpv6_setsockopt+0x45/0x80 net/ipv6/udp.c:1422
sock_common_setsockopt+0x95/0xd0 net/core/sock.c:2978
SYSC_setsockopt net/socket.c:1849 [inline]
SyS_setsockopt+0x189/0x360 net/socket.c:1828
entry_SYSCALL_64_fastpath+0x29/0xa0

other info that might help us debug this:

  Possible unsafe locking scenario:

CPU0CPU1

   lock(rtnl_mutex);
lock(sk_lock-AF_INET6);
lock(rtnl_mutex);
   lock(sk_lock-AF_INET6);

  *** DEADLOCK ***

1 lock held by syzkaller041579/3682:
  #0:  (rtnl_mutex){+.+.}, at: [<4342eaa9>] rtnl_lock+0x17/0x20
net/core/rtnetlink.c:74

The problem, as Florian noted, is that nf_setsockopt() is always
called with the socket held, even if the lock itself is required only
for very tight scopes and only for some operation.

This patch addresses the issues moving the lock_sock() call only
where really needed, namely in ipv*_getorigdst(), so that nf_setsockopt()
does not need anymore to acquire both locks.

Fixes: 22265a5c3c10 ("netfilter: xt_TEE: resolve oif using netdevice notifiers")
Reported-by: syzbot+a4c2dc980ac1af699...@syzkaller.appspotmail.com
Suggested-by: Florian Westphal 
Signed-off-by: Paolo Abeni 
---
 net/ipv4/ip_sockglue.c | 14 --
 net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c |  6 +-
 net/ipv6/ipv6_sockglue.c   | 17 +
 net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c | 18 --
 4 files changed, 26 insertions(+), 29 deletions(-)

diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index 60fb1eb7d7d8..c7df4969f80a 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -1251,11 +1251,8 @@ int ip_setsockopt(struct sock *sk, int level,
if (err == -ENOPROTOOPT && optname != IP_HDRINCL &&
optname != IP_IPSEC_POLICY &&
optname != IP_XFRM_POLICY &&
-   !ip_mroute_opt(optname)) {
-   lock_sock(sk);
+   !ip_mroute_opt(optname))
err = nf_setsockopt(sk, PF_INET, optname, optval, optlen);
-   release_sock(sk);
-   }
 #endif
return err;
 }
@@ -1280,12 +1277,9 @@ int compat_ip_setsockopt(struct sock *sk, int level, int 
optname,
if (err == -ENOPROTOOPT && optname != IP_HDRINCL &&
optname != IP_

[PATCH iproute2-next v2 5/6] utils: Introduce and use print_name_and_link() to print name@link

2018-01-30 Thread Serhey Popovych
There is at least three places implementing same things: two in
ipaddress.c print_linkinfo() & print_linkinfo_brief() and one in
bridge/link.c.

These two implementations diverge from each other very little:
bridge/link.c does not support JSON output at the moment and
print_linkinfo_brief() does not handle IFLA_LINK_NETNS case.

Introduce and use print_name_and_link() routine to handle name@link
output in all possible variations; respect IFLA_LINK_NETNS attribute to
handle case when link is in different namespace; use "if%d" template
for interface name instead of "" to share logic with other
code (e.g. ll_name_to_index() and ll_index_to_name()) supporting such
template.

Signed-off-by: Serhey Popovych 
---
 bridge/link.c   |   13 +++--
 include/utils.h |4 
 ip/ipaddress.c  |   48 ++--
 lib/utils.c |   51 +++
 4 files changed, 60 insertions(+), 56 deletions(-)

diff --git a/bridge/link.c b/bridge/link.c
index a11cbb1..90c9734 100644
--- a/bridge/link.c
+++ b/bridge/link.c
@@ -125,20 +125,13 @@ int print_linkinfo(const struct sockaddr_nl *who,
if (n->nlmsg_type == RTM_DELLINK)
fprintf(fp, "Deleted ");
 
-   fprintf(fp, "%d: %s ", ifi->ifi_index,
-   tb[IFLA_IFNAME] ? rta_getattr_str(tb[IFLA_IFNAME]) : "");
+   fprintf(fp, "%d: ", ifi->ifi_index);
+
+   print_name_and_link("%s: ", COLOR_NONE, name, tb);
 
if (tb[IFLA_OPERSTATE])
print_operstate(fp, rta_getattr_u8(tb[IFLA_OPERSTATE]));
 
-   if (tb[IFLA_LINK]) {
-   int iflink = rta_getattr_u32(tb[IFLA_LINK]);
-
-   fprintf(fp, "@%s: ",
-   iflink ? ll_index_to_name(iflink) : "NONE");
-   } else
-   fprintf(fp, ": ");
-
print_link_flags(fp, ifi->ifi_flags);
 
if (tb[IFLA_MTU])
diff --git a/include/utils.h b/include/utils.h
index 5738c97..d217073 100644
--- a/include/utils.h
+++ b/include/utils.h
@@ -12,6 +12,7 @@
 #include "libnetlink.h"
 #include "ll_map.h"
 #include "rtm_map.h"
+#include "json_print.h"
 
 extern int preferred_family;
 extern int human_readable;
@@ -240,6 +241,9 @@ void print_escape_buf(const __u8 *buf, size_t len, const 
char *escape);
 int print_timestamp(FILE *fp);
 void print_nlmsg_timestamp(FILE *fp, const struct nlmsghdr *n);
 
+unsigned int print_name_and_link(const char *fmt, enum color_attr color,
+const char *name, struct rtattr *tb[]);
+
 #define BIT(nr) (1UL << (nr))
 
 #define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
diff --git a/ip/ipaddress.c b/ip/ipaddress.c
index 1797927..5afc9d4 100644
--- a/ip/ipaddress.c
+++ b/ip/ipaddress.c
@@ -926,7 +926,6 @@ int print_linkinfo_brief(const struct sockaddr_nl *who,
struct rtattr *tb[IFLA_MAX+1];
int len = n->nlmsg_len;
const char *name;
-   char buf[32] = { 0, };
unsigned int m_flag = 0;
 
if (n->nlmsg_type != RTM_NEWLINK && n->nlmsg_type != RTM_DELLINK)
@@ -976,26 +975,7 @@ int print_linkinfo_brief(const struct sockaddr_nl *who,
if (n->nlmsg_type == RTM_DELLINK)
print_bool(PRINT_ANY, "deleted", "Deleted ", true);
 
-   if (tb[IFLA_LINK]) {
-   SPRINT_BUF(b1);
-   int iflink = rta_getattr_u32(tb[IFLA_LINK]);
-
-   if (iflink == 0) {
-   snprintf(buf, sizeof(buf), "%s@NONE", name);
-   print_null(PRINT_JSON, "link", NULL, NULL);
-   } else {
-   const char *link = ll_idx_n2a(iflink, b1);
-
-   print_string(PRINT_JSON, "link", NULL, link);
-   snprintf(buf, sizeof(buf), "%s@%s", name, link);
-   m_flag = ll_index_to_flags(iflink);
-   m_flag = !(m_flag & IFF_UP);
-   }
-   } else
-   snprintf(buf, sizeof(buf), "%s", name);
-
-   print_string(PRINT_FP, NULL, "%-16s ", buf);
-   print_string(PRINT_JSON, "ifname", NULL, name);
+   m_flag = print_name_and_link("%-16s ", COLOR_NONE, name, tb);
 
if (tb[IFLA_OPERSTATE])
print_operstate(fp, rta_getattr_u8(tb[IFLA_OPERSTATE]));
@@ -1102,31 +1082,7 @@ int print_linkinfo(const struct sockaddr_nl *who,
print_bool(PRINT_ANY, "deleted", "Deleted ", true);
 
print_int(PRINT_ANY, "ifindex", "%d: ", ifi->ifi_index);
-   print_color_string(PRINT_ANY, COLOR_IFNAME, "ifname", "%s", name);
-
-   if (tb[IFLA_LINK]) {
-   int iflink = rta_getattr_u32(tb[IFLA_LINK]);
-
-   if (iflink == 0)
-   print_null(PRINT_ANY, "link", "@%s: ", "NONE");
-   else {
-   if (tb[IFLA_LINK_NETNSID])
-   print_int(PRINT_ANY,
- "link_index", "@if%d: ", iflink);
-  

[PATCH iproute2-next v2 2/6] ipaddress: Simplify print_linkinfo_brief() and it's usage

2018-01-30 Thread Serhey Popovych
Improve print_linkinfo_brief() and it's callers:

  1) Get rid of custom @struct filter pointer @pfilter: it is NULL in
 all callers anyway and global @filter is used.

  2) Simplify calling code in ipaddr_list_flush_or_save() by
 introducing intermediate variable of @struct nlmsghdr, drop
 duplicated code: print_linkinfo_brief() never returns values other
 than <= 0 so we can move print_selected_addrinfo() outside of each
 block.

Signed-off-by: Serhey Popovych 
---
 ip/ip_common.h |3 +--
 ip/ipaddress.c |   60 
 ip/iplink.c|2 +-
 3 files changed, 28 insertions(+), 37 deletions(-)

diff --git a/ip/ip_common.h b/ip/ip_common.h
index 3203f0c..f5adbad 100644
--- a/ip/ip_common.h
+++ b/ip/ip_common.h
@@ -30,8 +30,7 @@ int get_operstate(const char *name);
 int print_linkinfo(const struct sockaddr_nl *who,
   struct nlmsghdr *n, void *arg);
 int print_linkinfo_brief(const struct sockaddr_nl *who,
-struct nlmsghdr *n, void *arg,
-struct link_filter *filter);
+struct nlmsghdr *n, void *arg);
 int print_addrinfo(const struct sockaddr_nl *who,
   struct nlmsghdr *n, void *arg);
 int print_addrlabel(const struct sockaddr_nl *who,
diff --git a/ip/ipaddress.c b/ip/ipaddress.c
index f8fd392..c15abd1 100644
--- a/ip/ipaddress.c
+++ b/ip/ipaddress.c
@@ -919,8 +919,7 @@ static void print_link_stats(FILE *fp, struct nlmsghdr *n)
 }
 
 int print_linkinfo_brief(const struct sockaddr_nl *who,
-struct nlmsghdr *n, void *arg,
-struct link_filter *pfilter)
+struct nlmsghdr *n, void *arg)
 {
FILE *fp = (FILE *)arg;
struct ifinfomsg *ifi = NLMSG_DATA(n);
@@ -937,12 +936,9 @@ int print_linkinfo_brief(const struct sockaddr_nl *who,
if (len < 0)
return -1;
 
-   if (!pfilter)
-   pfilter = &filter;
-
-   if (pfilter->ifindex && ifi->ifi_index != pfilter->ifindex)
+   if (filter.ifindex && ifi->ifi_index != filter.ifindex)
return -1;
-   if (pfilter->up && !(ifi->ifi_flags&IFF_UP))
+   if (filter.up && !(ifi->ifi_flags&IFF_UP))
return -1;
 
parse_rtattr(tb, IFLA_MAX, IFLA_RTA(ifi), len);
@@ -953,30 +949,30 @@ int print_linkinfo_brief(const struct sockaddr_nl *who,
name = rta_getattr_str(tb[IFLA_IFNAME]);
}
 
-   if (pfilter->label &&
-   (!pfilter->family || pfilter->family == AF_PACKET) &&
-   fnmatch(pfilter->label, name, 0))
+   if (filter.label &&
+   (!filter.family || filter.family == AF_PACKET) &&
+   fnmatch(filter.label, name, 0))
return -1;
 
if (tb[IFLA_GROUP]) {
int group = rta_getattr_u32(tb[IFLA_GROUP]);
 
-   if (pfilter->group != -1 && group != pfilter->group)
+   if (filter.group != -1 && group != filter.group)
return -1;
}
 
if (tb[IFLA_MASTER]) {
int master = rta_getattr_u32(tb[IFLA_MASTER]);
 
-   if (pfilter->master > 0 && master != pfilter->master)
+   if (filter.master > 0 && master != filter.master)
return -1;
-   } else if (pfilter->master > 0)
+   } else if (filter.master > 0)
return -1;
 
-   if (pfilter->kind && match_link_kind(tb, pfilter->kind, 0))
+   if (filter.kind && match_link_kind(tb, filter.kind, 0))
return -1;
 
-   if (pfilter->slave_kind && match_link_kind(tb, pfilter->slave_kind, 1))
+   if (filter.slave_kind && match_link_kind(tb, filter.slave_kind, 1))
return -1;
 
if (n->nlmsg_type == RTM_DELLINK)
@@ -1006,7 +1002,7 @@ int print_linkinfo_brief(const struct sockaddr_nl *who,
if (tb[IFLA_OPERSTATE])
print_operstate(fp, rta_getattr_u8(tb[IFLA_OPERSTATE]));
 
-   if (pfilter->family == AF_PACKET) {
+   if (filter.family == AF_PACKET) {
SPRINT_BUF(b1);
 
if (tb[IFLA_ADDRESS]) {
@@ -1020,7 +1016,7 @@ int print_linkinfo_brief(const struct sockaddr_nl *who,
}
}
 
-   if (pfilter->family == AF_PACKET) {
+   if (filter.family == AF_PACKET) {
print_link_flags(fp, ifi->ifi_flags, m_flag);
print_string(PRINT_FP, NULL, "%s", "\n");
}
@@ -2188,25 +2184,21 @@ static int ipaddr_list_flush_or_save(int argc, char 
**argv, int action)
ipaddr_filter(&linfo, ainfo);
 
for (l = linfo.head; l; l = l->next) {
-   int res = 0;
-   struct ifinfomsg *ifi = NLMSG_DATA(&l->h);
+   struct nlmsghdr *n = &l->h;
+   struct ifinfomsg *ifi = NLMSG_DATA(n);
+   int res;
 
open_json_object(NULL);
-   if (brief)

[PATCH iproute2-next v2 4/6] utils: Introduce and use get_ifname_rta()

2018-01-30 Thread Serhey Popovych
Be consistent in handling of IFLA_IFNAME attribute in all places: if
there is no attribute report bug to stderr and use ll_index_to_name() as
last measure to get name in "if%d" format instead of "".

Use check_ifname() to validate network device name: this catches both
unexpected return from kernel and ll_index_to_name().

Signed-off-by: Serhey Popovych 
---
 bridge/link.c   |8 
 include/utils.h |1 +
 ip/ipaddress.c  |   20 
 lib/utils.c |   19 +++
 4 files changed, 32 insertions(+), 16 deletions(-)

diff --git a/bridge/link.c b/bridge/link.c
index 870ebe0..a11cbb1 100644
--- a/bridge/link.c
+++ b/bridge/link.c
@@ -99,9 +99,10 @@ int print_linkinfo(const struct sockaddr_nl *who,
   struct nlmsghdr *n, void *arg)
 {
FILE *fp = arg;
-   int len = n->nlmsg_len;
struct ifinfomsg *ifi = NLMSG_DATA(n);
struct rtattr *tb[IFLA_MAX+1];
+   int len = n->nlmsg_len;
+   const char *name;
 
len -= NLMSG_LENGTH(sizeof(*ifi));
if (len < 0) {
@@ -117,10 +118,9 @@ int print_linkinfo(const struct sockaddr_nl *who,
 
parse_rtattr_flags(tb, IFLA_MAX, IFLA_RTA(ifi), len, NLA_F_NESTED);
 
-   if (tb[IFLA_IFNAME] == NULL) {
-   fprintf(stderr, "BUG: nil ifname\n");
+   name = get_ifname_rta(ifi->ifi_index, tb[IFLA_IFNAME]);
+   if (!name)
return -1;
-   }
 
if (n->nlmsg_type == RTM_DELLINK)
fprintf(fp, "Deleted ");
diff --git a/include/utils.h b/include/utils.h
index 0394268..5738c97 100644
--- a/include/utils.h
+++ b/include/utils.h
@@ -173,6 +173,7 @@ void duparg(const char *, const char *) 
__attribute__((noreturn));
 void duparg2(const char *, const char *) __attribute__((noreturn));
 int check_ifname(const char *);
 int get_ifname(char *, const char *);
+const char *get_ifname_rta(int ifindex, const struct rtattr *rta);
 int matches(const char *arg, const char *pattern);
 int inet_addr_match(const inet_prefix *a, const inet_prefix *b, int bits);
 int inet_addr_match_rta(const inet_prefix *m, const struct rtattr *rta);
diff --git a/ip/ipaddress.c b/ip/ipaddress.c
index c15abd1..1797927 100644
--- a/ip/ipaddress.c
+++ b/ip/ipaddress.c
@@ -942,12 +942,10 @@ int print_linkinfo_brief(const struct sockaddr_nl *who,
return -1;
 
parse_rtattr(tb, IFLA_MAX, IFLA_RTA(ifi), len);
-   if (tb[IFLA_IFNAME] == NULL) {
-   fprintf(stderr, "BUG: device with ifindex %d has nil ifname\n", 
ifi->ifi_index);
-   name = ll_index_to_name(ifi->ifi_index);
-   } else {
-   name = rta_getattr_str(tb[IFLA_IFNAME]);
-   }
+
+   name = get_ifname_rta(ifi->ifi_index, tb[IFLA_IFNAME]);
+   if (!name)
+   return -1;
 
if (filter.label &&
(!filter.family || filter.family == AF_PACKET) &&
@@ -1069,12 +1067,10 @@ int print_linkinfo(const struct sockaddr_nl *who,
return -1;
 
parse_rtattr(tb, IFLA_MAX, IFLA_RTA(ifi), len);
-   if (tb[IFLA_IFNAME] == NULL) {
-   fprintf(stderr, "BUG: device with ifindex %d has nil ifname\n", 
ifi->ifi_index);
-   name = ll_index_to_name(ifi->ifi_index);
-   } else {
-   name = rta_getattr_str(tb[IFLA_IFNAME]);
-   }
+
+   name = get_ifname_rta(ifi->ifi_index, tb[IFLA_IFNAME]);
+   if (!name)
+   return -1;
 
if (filter.label &&
(!filter.family || filter.family == AF_PACKET) &&
diff --git a/lib/utils.c b/lib/utils.c
index 8e15625..29e2d84 100644
--- a/lib/utils.c
+++ b/lib/utils.c
@@ -871,6 +871,25 @@ int get_ifname(char *buf, const char *name)
return ret;
 }
 
+const char *get_ifname_rta(int ifindex, const struct rtattr *rta)
+{
+   const char *name;
+
+   if (rta) {
+   name = rta_getattr_str(rta);
+   } else {
+   fprintf(stderr,
+   "BUG: device with ifindex %d has nil ifname\n",
+   ifindex);
+   name = ll_index_to_name(ifindex);
+   }
+
+   if (check_ifname(name))
+   return NULL;
+
+   return name;
+}
+
 int matches(const char *cmd, const char *pattern)
 {
int len = strlen(cmd);
-- 
1.7.10.4



[PATCH iproute2-next v2 0/6] ipaddress: Get rid of print_linkinfo_brief()

2018-01-30 Thread Serhey Popovych
With this series I propose to get rid of custom print_linkinfo_brief()
in favor of print_linkinfo() to avoid code duplication.

Changes presented with this series tested using following script:

iproute2_dir="$1"
iface='eth0.2'

pushd "$iproute2_dir" &>/dev/null

for i in new old; do
DIR="/tmp/$i"
mkdir -p "$DIR"

ln -snf ip.$i ip/ip

# normal
ip/ip link show  >"$DIR/ip-link-show"
ip/ip -4 addr show   >"$DIR/ip-4-addr-show"
ip/ip -6 addr show   >"$DIR/ip-6-addr-show"
ip/ip addr show dev "$iface" >"$DIR/ip-addr-show-$iface"

# brief
ip/ip -br link show  >"$DIR/ip-br-link-show"
ip/ip -br -4 addr show   >"$DIR/ip-br-4-addr-show"
ip/ip -br -6 addr show   >"$DIR/ip-br-6-addr-show"
ip/ip -br addr show dev "$iface" >"$DIR/ip-br-addr-show-$iface"
done
rm -f ip/ip

diff -urN /tmp/{old,new}
rc=$?

popd &>/dev/null
exit $rc

Expected results : 
Actual results   : 

Although test coverage is far from ideal in my opinion it covers most
important aspects of the changes presented by the series.

All this work is done in prepare of iplink_get() enhancements to support
attribute parse that finally will be used to simplify ip/tunnel
RTM_GETLINK code.

As always reviews, comments, suggestions and criticism is welcome.

v2
  Make print_linkinfo_brief() static instead of inlining it's code into
  print_linkinfo(). Better for review, better for code style, compiler
  will optimize this anyway.

Thanks,
Serhii

Serhey Popovych (6):
  ipaddress: Improve print_linkinfo()
  ipaddress: Simplify print_linkinfo_brief() and it's usage
  lib: Correct object file dependencies
  utils: Introduce and use get_ifname_rta()
  utils: Introduce and use print_name_and_link() to print name@link
  ipaddress: Make print_linkinfo_brief() static

 bridge/link.c   |   21 +++
 include/utils.h |5 ++
 ip/ip_common.h  |3 -
 ip/ipaddress.c  |  172 ++-
 ip/iplink.c |5 +-
 lib/Makefile|4 +-
 lib/utils.c |   70 ++
 7 files changed, 114 insertions(+), 166 deletions(-)

-- 
1.7.10.4



[PATCH iproute2-next v2 1/6] ipaddress: Improve print_linkinfo()

2018-01-30 Thread Serhey Popovych
There are few places to improve:

  1) return -1 when entry is filtered instead of zero, which means
 accept entry: ipaddress_list_flush_or_save() the only user of this

  2) use ll_index_to_name() as last resort to translate name to index:
 it will return name in the form "if%d" in case of failure

  3) replace open coded access to IFLA_IFNAME attribute data by
 RTA_DATA() with rta_getattr_str()

  4) simplify ifname printing since name is never NULL, thanks to (2).

Signed-off-by: Serhey Popovych 
---
 ip/ipaddress.c |   30 +-
 1 file changed, 13 insertions(+), 17 deletions(-)

diff --git a/ip/ipaddress.c b/ip/ipaddress.c
index 051a05f..f8fd392 100644
--- a/ip/ipaddress.c
+++ b/ip/ipaddress.c
@@ -948,14 +948,14 @@ int print_linkinfo_brief(const struct sockaddr_nl *who,
parse_rtattr(tb, IFLA_MAX, IFLA_RTA(ifi), len);
if (tb[IFLA_IFNAME] == NULL) {
fprintf(stderr, "BUG: device with ifindex %d has nil ifname\n", 
ifi->ifi_index);
-   name = "";
+   name = ll_index_to_name(ifi->ifi_index);
} else {
name = rta_getattr_str(tb[IFLA_IFNAME]);
}
 
if (pfilter->label &&
(!pfilter->family || pfilter->family == AF_PACKET) &&
-   fnmatch(pfilter->label, RTA_DATA(tb[IFLA_IFNAME]), 0))
+   fnmatch(pfilter->label, name, 0))
return -1;
 
if (tb[IFLA_GROUP]) {
@@ -1057,6 +1057,7 @@ int print_linkinfo(const struct sockaddr_nl *who,
struct ifinfomsg *ifi = NLMSG_DATA(n);
struct rtattr *tb[IFLA_MAX+1];
int len = n->nlmsg_len;
+   const char *name;
unsigned int m_flag = 0;
 
if (n->nlmsg_type != RTM_NEWLINK && n->nlmsg_type != RTM_DELLINK)
@@ -1067,18 +1068,22 @@ int print_linkinfo(const struct sockaddr_nl *who,
return -1;
 
if (filter.ifindex && ifi->ifi_index != filter.ifindex)
-   return 0;
+   return -1;
if (filter.up && !(ifi->ifi_flags&IFF_UP))
-   return 0;
+   return -1;
 
parse_rtattr(tb, IFLA_MAX, IFLA_RTA(ifi), len);
-   if (tb[IFLA_IFNAME] == NULL)
+   if (tb[IFLA_IFNAME] == NULL) {
fprintf(stderr, "BUG: device with ifindex %d has nil ifname\n", 
ifi->ifi_index);
+   name = ll_index_to_name(ifi->ifi_index);
+   } else {
+   name = rta_getattr_str(tb[IFLA_IFNAME]);
+   }
 
if (filter.label &&
(!filter.family || filter.family == AF_PACKET) &&
-   fnmatch(filter.label, RTA_DATA(tb[IFLA_IFNAME]), 0))
-   return 0;
+   fnmatch(filter.label, name, 0))
+   return -1;
 
if (tb[IFLA_GROUP]) {
int group = rta_getattr_u32(tb[IFLA_GROUP]);
@@ -1105,16 +1110,7 @@ int print_linkinfo(const struct sockaddr_nl *who,
print_bool(PRINT_ANY, "deleted", "Deleted ", true);
 
print_int(PRINT_ANY, "ifindex", "%d: ", ifi->ifi_index);
-   if (tb[IFLA_IFNAME]) {
-   print_color_string(PRINT_ANY,
-  COLOR_IFNAME,
-  "ifname", "%s",
-  rta_getattr_str(tb[IFLA_IFNAME]));
-   } else {
-   print_null(PRINT_JSON, "ifname", NULL, NULL);
-   print_color_null(PRINT_FP, COLOR_IFNAME,
-"ifname", "%s", "");
-   }
+   print_color_string(PRINT_ANY, COLOR_IFNAME, "ifname", "%s", name);
 
if (tb[IFLA_LINK]) {
int iflink = rta_getattr_u32(tb[IFLA_LINK]);
-- 
1.7.10.4



[PATCH iproute2-next v2 3/6] lib: Correct object file dependencies

2018-01-30 Thread Serhey Popovych
Neither internal libnetlink nor libgenl depends on ll_map.o: prepare for
upcoming changes that brings much more cleaner dependency between
utils.o and ll_map.o.

Signed-off-by: Serhey Popovych 
---
 lib/Makefile |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/lib/Makefile b/lib/Makefile
index 7b34ed5..bab8cbf 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -3,11 +3,11 @@ include ../config.mk
 
 CFLAGS += -fPIC
 
-UTILOBJ = utils.o rt_names.o ll_types.o ll_proto.o ll_addr.o \
+UTILOBJ = utils.o rt_names.o ll_map.o ll_types.o ll_proto.o ll_addr.o \
inet_proto.o namespace.o json_writer.o json_print.o \
names.o color.o bpf.o exec.o fs.o
 
-NLOBJ=libgenl.o ll_map.o libnetlink.o
+NLOBJ=libgenl.o libnetlink.o
 
 all: libnetlink.a libutil.a
 
-- 
1.7.10.4



[PATCH iproute2-next v2 6/6] ipaddress: Make print_linkinfo_brief() static

2018-01-30 Thread Serhey Popovych
It shares lot of code with print_linkinfo(): drop duplicated part,
change parameters list, make it static and call from print_linkinfo()
after common path.

While there move SPRINT_BUF() to the function scope from blocks to
avoid duplication and use "%s" to print "\n" to help compiler optimize
exit for both print_linkinfo_brief() and normal paths.

Signed-off-by: Serhey Popovych 
---
 ip/ip_common.h |2 --
 ip/ipaddress.c |   74 
 ip/iplink.c|5 +---
 3 files changed, 11 insertions(+), 70 deletions(-)

diff --git a/ip/ip_common.h b/ip/ip_common.h
index f5adbad..a7bbf1d 100644
--- a/ip/ip_common.h
+++ b/ip/ip_common.h
@@ -29,8 +29,6 @@ struct link_filter {
 int get_operstate(const char *name);
 int print_linkinfo(const struct sockaddr_nl *who,
   struct nlmsghdr *n, void *arg);
-int print_linkinfo_brief(const struct sockaddr_nl *who,
-struct nlmsghdr *n, void *arg);
 int print_addrinfo(const struct sockaddr_nl *who,
   struct nlmsghdr *n, void *arg);
 int print_addrlabel(const struct sockaddr_nl *who,
diff --git a/ip/ipaddress.c b/ip/ipaddress.c
index 5afc9d4..450d3cc 100644
--- a/ip/ipaddress.c
+++ b/ip/ipaddress.c
@@ -918,63 +918,12 @@ static void print_link_stats(FILE *fp, struct nlmsghdr *n)
fprintf(fp, "%s", _SL_);
 }
 
-int print_linkinfo_brief(const struct sockaddr_nl *who,
-struct nlmsghdr *n, void *arg)
+static int print_linkinfo_brief(FILE *fp, const char *name,
+   const struct ifinfomsg *ifi,
+   struct rtattr *tb[])
 {
-   FILE *fp = (FILE *)arg;
-   struct ifinfomsg *ifi = NLMSG_DATA(n);
-   struct rtattr *tb[IFLA_MAX+1];
-   int len = n->nlmsg_len;
-   const char *name;
unsigned int m_flag = 0;
 
-   if (n->nlmsg_type != RTM_NEWLINK && n->nlmsg_type != RTM_DELLINK)
-   return -1;
-
-   len -= NLMSG_LENGTH(sizeof(*ifi));
-   if (len < 0)
-   return -1;
-
-   if (filter.ifindex && ifi->ifi_index != filter.ifindex)
-   return -1;
-   if (filter.up && !(ifi->ifi_flags&IFF_UP))
-   return -1;
-
-   parse_rtattr(tb, IFLA_MAX, IFLA_RTA(ifi), len);
-
-   name = get_ifname_rta(ifi->ifi_index, tb[IFLA_IFNAME]);
-   if (!name)
-   return -1;
-
-   if (filter.label &&
-   (!filter.family || filter.family == AF_PACKET) &&
-   fnmatch(filter.label, name, 0))
-   return -1;
-
-   if (tb[IFLA_GROUP]) {
-   int group = rta_getattr_u32(tb[IFLA_GROUP]);
-
-   if (filter.group != -1 && group != filter.group)
-   return -1;
-   }
-
-   if (tb[IFLA_MASTER]) {
-   int master = rta_getattr_u32(tb[IFLA_MASTER]);
-
-   if (filter.master > 0 && master != filter.master)
-   return -1;
-   } else if (filter.master > 0)
-   return -1;
-
-   if (filter.kind && match_link_kind(tb, filter.kind, 0))
-   return -1;
-
-   if (filter.slave_kind && match_link_kind(tb, filter.slave_kind, 1))
-   return -1;
-
-   if (n->nlmsg_type == RTM_DELLINK)
-   print_bool(PRINT_ANY, "deleted", "Deleted ", true);
-
m_flag = print_name_and_link("%-16s ", COLOR_NONE, name, tb);
 
if (tb[IFLA_OPERSTATE])
@@ -1033,6 +982,7 @@ int print_linkinfo(const struct sockaddr_nl *who,
int len = n->nlmsg_len;
const char *name;
unsigned int m_flag = 0;
+   SPRINT_BUF(b1);
 
if (n->nlmsg_type != RTM_NEWLINK && n->nlmsg_type != RTM_DELLINK)
return 0;
@@ -1081,6 +1031,9 @@ int print_linkinfo(const struct sockaddr_nl *who,
if (n->nlmsg_type == RTM_DELLINK)
print_bool(PRINT_ANY, "deleted", "Deleted ", true);
 
+   if (brief)
+   return print_linkinfo_brief(fp, name, ifi, tb);
+
print_int(PRINT_ANY, "ifindex", "%d: ", ifi->ifi_index);
m_flag = print_name_and_link("%s: ", COLOR_IFNAME, name, tb);
print_link_flags(fp, ifi->ifi_flags, m_flag);
@@ -1097,8 +1050,6 @@ int print_linkinfo(const struct sockaddr_nl *who,
 "qdisc %s ",
 rta_getattr_str(tb[IFLA_QDISC]));
if (tb[IFLA_MASTER]) {
-   SPRINT_BUF(b1);
-
print_string(PRINT_ANY,
 "master",
 "master %s ",
@@ -1112,7 +1063,6 @@ int print_linkinfo(const struct sockaddr_nl *who,
print_linkmode(fp, tb[IFLA_LINKMODE]);
 
if (tb[IFLA_GROUP]) {
-   SPRINT_BUF(b1);
int group = rta_getattr_u32(tb[IFLA_GROUP]);
 
print_string(PRINT_ANY,
@@ -1279,7 +1229,7 @@ int print_linkinfo(const struct sockaddr_nl *who,
close_json_array(PRINT_JSON, NULL);
}
 

Re: [PATCH iproute2-next 6/6] ipaddress: Get rid of print_linkinfo_brief()

2018-01-30 Thread Serhey Popovych
Stephen Hemminger wrote:
> On Tue, 30 Jan 2018 18:52:48 +0200
> Serhey Popovych  wrote:
> 
>> +if (brief) {
>> +print_name_and_link("%-16s ", COLOR_NONE, name, tb);
>> +
>> +if (tb[IFLA_OPERSTATE])
>> +print_operstate(fp, rta_getattr_u8(tb[IFLA_OPERSTATE]));
>> +
>> +if (filter.family == AF_PACKET) {
>> +if (tb[IFLA_ADDRESS]) {
>> +struct rtattr *rta = tb[IFLA_ADDRESS];
>> +
>> +print_color_string(PRINT_ANY,
>> +   COLOR_MAC,
>> +   "address",
>> +   "%s ",
>> +   ll_addr_n2a(RTA_DATA(rta),
>> +   RTA_PAYLOAD(rta),
>> +   ifi->ifi_type,
>> +   b1, sizeof(b1)));
>> +}
>> +
>> +print_link_flags(fp, ifi->ifi_flags, m_flag);
>> +print_string(PRINT_FP, NULL, "%s", "\n");
>> +}
>> +
>> +fflush(fp);
>> +return 0;
>> +}
> 
> To keep function shorter and therefore more readable, why not:
> 
>   if (brief)
>   return print_linkinfo_brief(fp, ifi, tb);
> 
> And put this if branch in new version of print_linkinfo_brief.
> 

Agree, will make it static and branch as suggested. Thanks.
Addressed in v2.




signature.asc
Description: OpenPGP digital signature


[PATCH] wireless: zd1211rw: remove redundant assignment of pointer 'q'

2018-01-30 Thread Colin King
From: Colin Ian King 

Pointer q is initialized and then almost immediately afterwards being
re-assigned the same value. Remove the second redundant assignment.

Cleans up clang warning:
drivers/net/wireless/zydas/zd1211rw/zd_mac.c:503:23: warning: Value
stored to 'q' during its initialization is never read

Signed-off-by: Colin Ian King 
---
 drivers/net/wireless/zydas/zd1211rw/zd_mac.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/wireless/zydas/zd1211rw/zd_mac.c 
b/drivers/net/wireless/zydas/zd1211rw/zd_mac.c
index b785742bfd9e..b01b44a5d16e 100644
--- a/drivers/net/wireless/zydas/zd1211rw/zd_mac.c
+++ b/drivers/net/wireless/zydas/zd1211rw/zd_mac.c
@@ -509,7 +509,6 @@ void zd_mac_tx_failed(struct urb *urb)
int found = 0;
int i, position = 0;
 
-   q = &mac->ack_wait_queue;
spin_lock_irqsave(&q->lock, flags);
 
skb_queue_walk(q, skb) {
-- 
2.15.1



[PATCH] samples/bpf: Add program for CPU state statistics

2018-01-30 Thread Leo Yan
CPU is active when have running tasks on it and CPUFreq governor can
select different operating points (OPP) according to different workload;
we use 'pstate' to present CPU state which have running tasks with one
specific OPP.  On the other hand, CPU is idle which only idle task on
it, CPUIdle governor can select one specific idle state to power off
hardware logics; we use 'cstate' to present CPU idle state.

Based on trace events 'cpu_idle' and 'cpu_frequency' we can accomplish
the duration statistics for every state.  Every time when CPU enters
into or exits from idle states, the trace event 'cpu_idle' is recorded;
trace event 'cpu_frequency' records the event for CPU OPP changing, so
it's easily to know how long time the CPU stays in the specified OPP,
and the CPU must be not in any idle state.

This patch is to utilize the mentioned trace events for pstate and
cstate statistics.  To achieve more accurate profiling data, the program
uses below sequence to insure CPU running/idle time aren't missed:

- Before profiling the user space program wakes up all CPUs for once, so
  can avoid to missing account time for CPU staying in idle state for
  long time; the program forces to set 'scaling_max_freq' to lowest
  frequency and then restore 'scaling_max_freq' to highest frequency,
  this can ensure the frequency to be set to lowest frequency and later
  after start to run workload the frequency can be easily to be changed
  to higher frequency;

- User space program reads map data and update statistics for every 5s,
  so this is same with other sample bpf programs for avoiding big
  overload introduced by bpf program self;

- When send signal to terminate program, the signal handler wakes up
  all CPUs, set lowest frequency and restore highest frequency to
  'scaling_max_freq'; this is exactly same with the first step so
  avoid to missing account CPU pstate and cstate time during last
  stage.  Finally it reports the latest statistics.

The program has been tested on Hikey board with octa CA53 CPUs, below
is the example for statistics result:

CPU 0
State: Duration(ms)  Distribution
cstate 0 : 47555|*   |
cstate 1 : 0||
cstate 2 : 0||
pstate 0 : 15239|*   |
pstate 1 : 1521 ||
pstate 2 : 3188 |*   |
pstate 3 : 1836 ||
pstate 4 : 94   ||

CPU 1
State: Duration(ms)  Distribution
cstate 0 : 87   ||
cstate 1 : 16264|**  |
cstate 2 : 50458|*** |
pstate 0 : 832  ||
pstate 1 : 131  ||
pstate 2 : 825  ||
pstate 3 : 787  ||
pstate 4 : 4||

CPU 2
State: Duration(ms)  Distribution
cstate 0 : 177  ||
cstate 1 : 9363 |*   |
cstate 2 : 55835|*** |
pstate 0 : 1468 ||
pstate 1 : 350  ||
pstate 2 : 1062 ||
pstate 3 : 1164 ||
pstate 4 : 7||

CPU 3
State: Duration(ms)  Distribution
cstate 0 : 89   ||
cstate 1 : 14546|*   |
cstate 2 : 51591|*** |
pstate 0 : 907  ||
pstate 1 : 231  ||
pstate 2 : 894  ||
pstate 3 : 1154 ||
pstate 4 : 17   ||

CPU 4
State: Duration(ms)  Distribution
cstate 0 : 101  ||
cstate 1 : 16904|*** |
cstate 2 : 49544|**  |
pstate 0 : 678  ||
pstate 1 : 230  ||
pstate 2 : 770  ||
pstate 3 : 1065 ||
pstate 4 : 8||

CPU 5
State:

Re: net: hang in unregister_netdevice: waiting for lo to become free

2018-01-30 Thread Cong Wang
On Tue, Jan 30, 2018 at 4:09 AM, Dmitry Vyukov  wrote:
> Hello,
>
> The following program creates a hang in unregister_netdevice.
> cleanup_net work hangs there forever periodically printing
> "unregister_netdevice: waiting for lo to become free. Usage count = 3"
> and creation of any new network namespaces hangs forever.

Interestingly, this is not reproducible on net-next.


[PATCH 01/10] net/sched: kconfig: Remove empty help texts

2018-01-30 Thread Ulf Magnusson
In preparation for adding a warning ("kconfig: Warn if help text is
blank"): https://lkml.org/lkml/2018/1/30/516

Signed-off-by: Ulf Magnusson 
---
 net/sched/Kconfig | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/net/sched/Kconfig b/net/sched/Kconfig
index c03d86a7775e..f24a6ae6819a 100644
--- a/net/sched/Kconfig
+++ b/net/sched/Kconfig
@@ -857,17 +857,14 @@ config NET_ACT_TUNNEL_KEY
 config NET_IFE_SKBMARK
 tristate "Support to encoding decoding skb mark on IFE action"
 depends on NET_ACT_IFE
----help---
 
 config NET_IFE_SKBPRIO
 tristate "Support to encoding decoding skb prio on IFE action"
 depends on NET_ACT_IFE
----help---
 
 config NET_IFE_SKBTCINDEX
 tristate "Support to encoding decoding skb tcindex on IFE action"
 depends on NET_ACT_IFE
----help---
 
 config NET_CLS_IND
bool "Incoming device classification"
-- 
2.14.1



[PATCH v2 01/10] net/sched: kconfig: Remove empty help texts

2018-01-30 Thread Ulf Magnusson
In preparation for adding a warning ("kconfig: Warn if help text is
blank"): https://lkml.org/lkml/2018/1/30/516

Signed-off-by: Ulf Magnusson 
---
 net/sched/Kconfig | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/net/sched/Kconfig b/net/sched/Kconfig
index c03d86a7775e..f24a6ae6819a 100644
--- a/net/sched/Kconfig
+++ b/net/sched/Kconfig
@@ -857,17 +857,14 @@ config NET_ACT_TUNNEL_KEY
 config NET_IFE_SKBMARK
 tristate "Support to encoding decoding skb mark on IFE action"
 depends on NET_ACT_IFE
----help---
 
 config NET_IFE_SKBPRIO
 tristate "Support to encoding decoding skb prio on IFE action"
 depends on NET_ACT_IFE
----help---
 
 config NET_IFE_SKBTCINDEX
 tristate "Support to encoding decoding skb tcindex on IFE action"
 depends on NET_ACT_IFE
----help---
 
 config NET_CLS_IND
bool "Incoming device classification"
-- 
2.14.1



Re: sctp netns "unregister_netdevice: waiting for lo to become free. Usage count = 1"

2018-01-30 Thread Tommi Rantala

On 30.01.2018 17:59, Neil Horman wrote:

On Mon, Jan 29, 2018 at 05:55:45PM +0200, Tommi Rantala wrote:


ip netns add TEST
ip netns exec TEST ip link set lo up
ip link add dummy0 type dummy
ip link add dummy1 type dummy
ip link add dummy2 type dummy
ip link set dev dummy0 netns TEST
ip link set dev dummy1 netns TEST
ip link set dev dummy2 netns TEST
ip netns exec TEST ip addr add 192.168.1.1/24 dev dummy0
ip netns exec TEST ip link set dummy0 up
ip netns exec TEST ip addr add 192.168.1.2/24 dev dummy1
ip netns exec TEST ip link set dummy1 up
ip netns exec TEST ip addr add 192.168.1.3/24 dev dummy2
ip netns exec TEST ip link set dummy2 up
ip netns exec TEST sctp_test -H 192.168.1.2 -P 20002 -h 192.168.1.1 -p 2
-s -B 192.168.1.3
ip netns del TEST


Does the problem occur if you don't set lo up?


Still happens after dropping "ip netns exec TEST ip link set lo up".

Omitting "-B 192.168.1.3" from the sctp_test args helps.

Tommi


Re: [netfilter-core] kernel panic: Out of memory and no killable processes... (2)

2018-01-30 Thread Andrew Morton
On Tue, 30 Jan 2018 15:01:04 +0100 Michal Hocko  wrote:

> > Well, this is not about syzkaller, it merely pointed out a potential
> > DoS... And that has to be addressed somehow.
> 
> So how about this?
> ---

argh ;)

> >From d48e950f1b04f234b57b9e34c363bdcfec10aeee Mon Sep 17 00:00:00 2001
> From: Michal Hocko 
> Date: Tue, 30 Jan 2018 14:51:07 +0100
> Subject: [PATCH] net/netfilter/x_tables.c: make allocation less aggressive
> 
> syzbot has noticed that xt_alloc_table_info can allocate a lot of
> memory. This is an admin only interface but an admin in a namespace
> is sufficient as well. eacd86ca3b03 ("net/netfilter/x_tables.c: use
> kvmalloc() in xt_alloc_table_info()") has changed the opencoded
> kmalloc->vmalloc fallback into kvmalloc. It has dropped __GFP_NORETRY on
> the way because vmalloc has simply never fully supported __GFP_NORETRY
> semantic. This is still the case because e.g. page tables backing the
> vmalloc area are hardcoded GFP_KERNEL.
> 
> Revert back to __GFP_NORETRY as a poors man defence against excessively
> large allocation request here. We will not rule out the OOM killer
> completely but __GFP_NORETRY should at least stop the large request
> in most cases.
> 
> Fixes: eacd86ca3b03 ("net/netfilter/x_tables.c: use kvmalloc() in 
> xt_alloc_table_info()")
> Signed-off-by: Michal Hocko 
> ---
>  net/netfilter/x_tables.c | 8 +++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
> index d8571f414208..a5f5c29bcbdc 100644
> --- a/net/netfilter/x_tables.c
> +++ b/net/netfilter/x_tables.c
> @@ -1003,7 +1003,13 @@ struct xt_table_info *xt_alloc_table_info(unsigned int 
> size)
>   if ((SMP_ALIGN(size) >> PAGE_SHIFT) + 2 > totalram_pages)
>   return NULL;

offtopic: preceding comment here is "prevent them from hitting BUG() in
vmalloc.c".  I suspect this is ancient code and vmalloc sure as heck
shouldn't go BUG with this input.  And it should be using `sz' ;)

So I suspect and hope that this code can be removed.  If not, let's fix
vmalloc!

> - info = kvmalloc(sz, GFP_KERNEL);
> + /*
> +  * __GFP_NORETRY is not fully supported by kvmalloc but it should
> +  * work reasonably well if sz is too large and bail out rather
> +  * than shoot all processes down before realizing there is nothing
> +  * more to reclaim.
> +  */
> + info = kvmalloc(sz, GFP_KERNEL | __GFP_NORETRY);
>   if (!info)
>   return NULL;

checkpatch sayeth

networking block comments don't use an empty /* line, use /* Comment...

So I'll do that and shall scoot the patch Davewards.


[PATCH][next] wil6210: fix spelling mistake: "preperation"-> "preparation"

2018-01-30 Thread Colin King
From: Colin Ian King 

Trivial fix to spelling mistake in debug error message text.

Signed-off-by: Colin Ian King 
---
 drivers/net/wireless/ath/wil6210/main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/wireless/ath/wil6210/main.c 
b/drivers/net/wireless/ath/wil6210/main.c
index 0c61a6c13991..04c8651274d9 100644
--- a/drivers/net/wireless/ath/wil6210/main.c
+++ b/drivers/net/wireless/ath/wil6210/main.c
@@ -715,7 +715,7 @@ static void wil_bl_prepare_halt(struct wil6210_priv *wil)
offsetof(struct bl_dedicated_registers_v0,
 boot_loader_struct_version));
if (!tmp) {
-   wil_dbg_misc(wil, "old BL, skipping halt preperation\n");
+   wil_dbg_misc(wil, "old BL, skipping halt preparation\n");
return;
}
 
-- 
2.15.1



Re: [RFC PATCH 1/2] hv_netvsc: Split netvsc_revoke_buf() and netvsc_teardown_gpadl()

2018-01-30 Thread Stephen Hemminger
On Tue, 23 Jan 2018 10:34:04 +0100
Mohammed Gamal  wrote:

> Split each of the functions into two for each of send/recv buffers
> 
> Signed-off-by: Mohammed Gamal 

Splitting these functions is not necessary


[patch 1/1] net/netfilter/x_tables.c: make allocation less aggressive

2018-01-30 Thread akpm
From: Michal Hocko 
Subject: net/netfilter/x_tables.c: make allocation less aggressive

syzbot has noticed that xt_alloc_table_info can allocate a lot of memory. 
This is an admin only interface but an admin in a namespace is sufficient
as well.  eacd86ca3b03 ("net/netfilter/x_tables.c: use kvmalloc() in
xt_alloc_table_info()") has changed the opencoded kmalloc->vmalloc
fallback into kvmalloc.  It has dropped __GFP_NORETRY on the way because
vmalloc has simply never fully supported __GFP_NORETRY semantic.  This is
still the case because e.g.  page tables backing the vmalloc area are
hardcoded GFP_KERNEL.

Revert back to __GFP_NORETRY as a poors man defence against excessively
large allocation request here.  We will not rule out the OOM killer
completely but __GFP_NORETRY should at least stop the large request in
most cases.

[a...@linux-foundation.org: coding-style fixes]
Fixes: eacd86ca3b03 ("net/netfilter/x_tables.c: use kvmalloc() in 
xt_alloc_tableLink: 
http://lkml.kernel.org/r/20180130140104.ge21...@dhcp22.suse.cz
Signed-off-by: Michal Hocko 
Acked-by: Florian Westphal 
Reviewed-by: Andrew Morton 
Cc: David S. Miller 
Signed-off-by: Andrew Morton 
---

 net/netfilter/x_tables.c |7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff -puN 
net/netfilter/x_tables.c~net-netfilter-x_tablesc-make-allocation-less-aggressive
 net/netfilter/x_tables.c
--- 
a/net/netfilter/x_tables.c~net-netfilter-x_tablesc-make-allocation-less-aggressive
+++ a/net/netfilter/x_tables.c
@@ -1008,7 +1008,12 @@ struct xt_table_info *xt_alloc_table_inf
if ((size >> PAGE_SHIFT) + 2 > totalram_pages)
return NULL;
 
-   info = kvmalloc(sz, GFP_KERNEL);
+   /* __GFP_NORETRY is not fully supported by kvmalloc but it should
+* work reasonably well if sz is too large and bail out rather
+* than shoot all processes down before realizing there is nothing
+* more to reclaim.
+*/
+   info = kvmalloc(sz, GFP_KERNEL | __GFP_NORETRY);
if (!info)
return NULL;
 
_


Re: [RFC PATCH 2/2] hv_netvsc: Change GPADL teardown order according to Hyper-V version

2018-01-30 Thread Stephen Hemminger
On Tue, 23 Jan 2018 10:34:05 +0100
Mohammed Gamal  wrote:

> Commit 0cf737808ae7 ("hv_netvsc: netvsc_teardown_gpadl() split")
> introduced a regression causing VMs not to shutdown on pre-Wind2016
> hosts after netvsc_remove_device() is called. This was caused as the
> GPADL teardown sequence was changed.
> 
> This patch restores the old behavior for pre-Win2016 hosts, while
> keeping the changes from 0cf7378 for Win2016 and higher hosts.
> 
> Signed-off-by: Mohammed Gamal 

Investigated the Windows driver to see how it handled this.
It uses NVSP version < 4 to check for older hosts. So that patch
should use that.

Currently testing a version with that change.


Re: [Intel-wired-lan] [RFC PATCH] e1000e: Remove Other from EIAC.

2018-01-30 Thread Alexander Duyck
On Wed, Jan 17, 2018 at 10:50 PM, Benjamin Poirier  wrote:
> It was reported that emulated e1000e devices in vmware esxi 6.5 Build
> 7526125 do not link up after commit 4aea7a5c5e94 ("e1000e: Avoid receiver
> overrun interrupt bursts", v4.15-rc1). Some tracing shows that after
> e1000e_trigger_lsc() is called, ICR reads out as 0x0 in e1000_msix_other()
> on emulated e1000e devices. In comparison, on real e1000e 82574 hardware,
> icr=0x8004 (_INT_ASSERTED | _OTHER) in the same situation.
>
> Some experimentation showed that this flaw in vmware e1000e emulation can
> be worked around by not setting Other in EIAC. This is how it was before
> 16ecba59bc33 ("e1000e: Do not read ICR in Other interrupt", v4.5-rc1).
>
> Fixes: 4aea7a5c5e94 ("e1000e: Avoid receiver overrun interrupt bursts")
> Signed-off-by: Benjamin Poirier 
> ---

Hi Benjamin,

How would you feel about resubmitting this patch for net?

We have some issues that have come up and it would be useful to have
this fixed in the kernel sooner rather than later. I would be okay
with us applying it for now while we work on coming up with a more
complete solution.

Thanks.

- Alex


Re: [patch 1/1] net/netfilter/x_tables.c: make allocation less aggressive

2018-01-30 Thread Eric Dumazet
On Tue, 2018-01-30 at 11:30 -0800, a...@linux-foundation.org wrote:
> From: Michal Hocko 
> Subject: net/netfilter/x_tables.c: make allocation less aggressive
> 
> syzbot has noticed that xt_alloc_table_info can allocate a lot of memory. 
> This is an admin only interface but an admin in a namespace is sufficient
> as well.  eacd86ca3b03 ("net/netfilter/x_tables.c: use kvmalloc() in
> xt_alloc_table_info()") has changed the opencoded kmalloc->vmalloc
> fallback into kvmalloc.  It has dropped __GFP_NORETRY on the way because
> vmalloc has simply never fully supported __GFP_NORETRY semantic.  This is
> still the case because e.g.  page tables backing the vmalloc area are
> hardcoded GFP_KERNEL.
> 
> Revert back to __GFP_NORETRY as a poors man defence against excessively
> large allocation request here.  We will not rule out the OOM killer
> completely but __GFP_NORETRY should at least stop the large request in
> most cases.
> 
> [a...@linux-foundation.org: coding-style fixes]
> Fixes: eacd86ca3b03 ("net/netfilter/x_tables.c: use kvmalloc() in 
> xt_alloc_tableLink: 
> http://lkml.kernel.org/r/20180130140104.ge21...@dhcp22.suse.cz
> Signed-off-by: Michal Hocko 
> Acked-by: Florian Westphal 
> Reviewed-by: Andrew Morton 
> Cc: David S. Miller 
> Signed-off-by: Andrew Morton 
> ---
> 
>  net/netfilter/x_tables.c |7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff -puN 
> net/netfilter/x_tables.c~net-netfilter-x_tablesc-make-allocation-less-aggressive
>  net/netfilter/x_tables.c
> --- 
> a/net/netfilter/x_tables.c~net-netfilter-x_tablesc-make-allocation-less-aggressive
> +++ a/net/netfilter/x_tables.c
> @@ -1008,7 +1008,12 @@ struct xt_table_info *xt_alloc_table_inf
>   if ((size >> PAGE_SHIFT) + 2 > totalram_pages)
>   return NULL;
>  
> - info = kvmalloc(sz, GFP_KERNEL);
> + /* __GFP_NORETRY is not fully supported by kvmalloc but it should
> +  * work reasonably well if sz is too large and bail out rather
> +  * than shoot all processes down before realizing there is nothing
> +  * more to reclaim.
> +  */
> + info = kvmalloc(sz, GFP_KERNEL | __GFP_NORETRY);
>   if (!info)
>   return NULL;


How is __GFP_NORETRY working exactly ?

Surely, if some firewall tools attempt to load a new iptables rules, we
do not want to abort them if the request can be satisfied after few
pages moved on swap or written back to disk.

We want to avoid huge allocations, but leave reasonable ones succeed.

Thanks.



Re: net: hang in unregister_netdevice: waiting for lo to become free

2018-01-30 Thread Daniel Borkmann
On 01/30/2018 07:32 PM, Cong Wang wrote:
> On Tue, Jan 30, 2018 at 4:09 AM, Dmitry Vyukov  wrote:
>> Hello,
>>
>> The following program creates a hang in unregister_netdevice.
>> cleanup_net work hangs there forever periodically printing
>> "unregister_netdevice: waiting for lo to become free. Usage count = 3"
>> and creation of any new network namespaces hangs forever.
> 
> Interestingly, this is not reproducible on net-next.

The most recent change on netns refcnt was 4ee806d51176 ("net: tcp: close
sock if net namespace is exiting") in net/net-next from 5 days ago, maybe
fixed due to that?


Re: macvlan devices and vlan interaction

2018-01-30 Thread Shannon Nelson

On 1/29/2018 3:01 PM, Keller, Jacob E wrote:

Hi,

I'm currently investigating how macvlan devices behave in regards to vlan 
support, and found some interesting behavior that I am not sure how best to 
correct, or what the right path forward is.

If I create a macvlan device:

ip link add link ens0 name macvlan0 type macvlan:

and then add a VLAN to it:

ip link add link macvlan0 name vlan10 type vlan id 10

This works to pass VLAN 10 traffic over the macvlan device. This seems like 
expected behavior.

However, if I then also add vlan 10 to the lowerdev:

ip link add link ens0 name lowervlan10  type vlan id 10

Then traffic stops flowing to the VLAN on the macvlan device.

This happens, as far as I can tell, because of how the VLAN traffic is filtered 
first, and then forwarded to the VLAN device, which doesn't know about how the 
macvlan device exists.

It seems, essentially, that vlan stacked on top of a macvlan shouldn't work. 
Because the vlan code basically expects each vlan to apply to every MAC 
address, and the macvlan device works by putting its MAC address into the 
unicast address list, there's no way for a device driver to know when or how to 
apply the vlan.

This gets a bit more confusing when we add in the l2 fwd hardware offload.

Currently, at least for the Intel network parts, this isn't supported, because 
of a bug in which the device drivers don't apply the VLANs to the macvlan 
accelerated addresses. If we fix this, at least for fm10k, the behavior is 
slightly better, because of how the hardware filtering at the MAC address 
happens first, and we direct the traffic to the proper device regardless of 
VLAN.

In addition to this peculiarity of VLANs on both the macvlan and lowerdev, is 
that when a macvlan device adds a VLAN, the lowerdev gets an indication to add 
the vlan via its .ndo_vlan_rx_add_vid(), which doesn't distinguish between 
which addresses the VLAN might apply to. It thus simply, depending on hardware 
design, enables the VLAN for all its unicast and multicast addresses. Some 
hardware could theoretically support MAC+VLAN pairs, where it could distinguish 
that a VLAN should only be added for some subset of addresses. Other hardware 
might not be so lucky..

Unfortunately, this has the weird consequence that if we have the following 
stack of devices:

vlan10@macvlan0
macvlan0@ens0
ens0

Then ens0 will receive VLAN10 traffic on every address. So VLAN 10 traffic 
destined to the MAC of the lowerdev will be received, instead of dropped.

If we add VLAN 10 to the lowerdev so we have both the above stack and also

lowervlan10@ens0
ens0 (mac gg:hh:ii:jj:kk)

then all vlan 10 traffic will be received on the lowerdev VLAN 10, without any 
being forwarded to the VLAN10 attached to the macvlan.

However, if we add two macvlans, and each add the vlan10, so we have the 
following:

avlan10@macvlan0
macvlan0@ens0
ens0

bvlan10@macvlan1
macvlan1@ens0
ens0

In this case, it does appear that traffic is sorted out correctly. It seems 
that only if the lowerdev gets the VLAN does it end up breaking. If I remove 
bvlan10 from macvlan1, the traffic associated with vlan10 is still received by 
macvlan1, even though in principle it should no longer be.

What is the correct behavior here? Should this just be "administrators should know 
better"? I don't think that's a great argument, and either way we're still 
essentially leaking VLANs across the macvlan interfaces, which I don't think is ideal.

I see two possible solutions:

1) modify macvlan driver so that it is marked as VLAN_CHALLENGED, and thus 
indicate it cannot handle VLAN traffic on top of it.
   a. In order to get the VLANs associated, administrator could instead add the 
VLAN first, and then add the macvlan on top. This I think is a better 
configuration.
   b. that doesn't work in the offload case, unless/until we fix the VLAN 
interface to forward the l2_dfwd_add_station() along with a vid.
   c. this could appear as loss of functionality, since in some cases these 
VLAN on top of macvlan work today (with the interesting caveats listed above).

2) modify how VLANs interact with MAC addresses, so that the lowerdev can 
explicitly be aware of which VLANs are tied to which address groups, in order 
to allow for the explicit configuration of which MAC+VLAN pairs are actually 
allowed.
   a. this is a much more invasive change to driver interface, and more 
difficult to get right
   b. possibly other configurations of stacked devices might have a similar 
problem, so we could solve more here? Or create more problems.. I'm not really 
certain.


I think the correct solution is (1) but I wasn't sure what others thought, and 
whether anyone else has encountered the problems I mention and outline above. I 
cc'd Alex who I discussed with offline when I first heard of and began 
investigating this, in case he has anything further to add.

Regards,
Jake



Hi Jake,

The current behavior seems logical to me, but I suppose Alex might argu

Re: macvlan devices and vlan interaction

2018-01-30 Thread Alexander Duyck
On Tue, Jan 30, 2018 at 12:29 PM, Shannon Nelson
 wrote:
> On 1/29/2018 3:01 PM, Keller, Jacob E wrote:
>>
>> Hi,
>>
>> I'm currently investigating how macvlan devices behave in regards to vlan
>> support, and found some interesting behavior that I am not sure how best to
>> correct, or what the right path forward is.
>>
>> If I create a macvlan device:
>>
>> ip link add link ens0 name macvlan0 type macvlan:
>>
>> and then add a VLAN to it:
>>
>> ip link add link macvlan0 name vlan10 type vlan id 10
>>
>> This works to pass VLAN 10 traffic over the macvlan device. This seems
>> like expected behavior.
>>
>> However, if I then also add vlan 10 to the lowerdev:
>>
>> ip link add link ens0 name lowervlan10  type vlan id 10
>>
>> Then traffic stops flowing to the VLAN on the macvlan device.
>>
>> This happens, as far as I can tell, because of how the VLAN traffic is
>> filtered first, and then forwarded to the VLAN device, which doesn't know
>> about how the macvlan device exists.
>>
>> It seems, essentially, that vlan stacked on top of a macvlan shouldn't
>> work. Because the vlan code basically expects each vlan to apply to every
>> MAC address, and the macvlan device works by putting its MAC address into
>> the unicast address list, there's no way for a device driver to know when or
>> how to apply the vlan.
>>
>> This gets a bit more confusing when we add in the l2 fwd hardware offload.
>>
>> Currently, at least for the Intel network parts, this isn't supported,
>> because of a bug in which the device drivers don't apply the VLANs to the
>> macvlan accelerated addresses. If we fix this, at least for fm10k, the
>> behavior is slightly better, because of how the hardware filtering at the
>> MAC address happens first, and we direct the traffic to the proper device
>> regardless of VLAN.
>>
>> In addition to this peculiarity of VLANs on both the macvlan and lowerdev,
>> is that when a macvlan device adds a VLAN, the lowerdev gets an indication
>> to add the vlan via its .ndo_vlan_rx_add_vid(), which doesn't distinguish
>> between which addresses the VLAN might apply to. It thus simply, depending
>> on hardware design, enables the VLAN for all its unicast and multicast
>> addresses. Some hardware could theoretically support MAC+VLAN pairs, where
>> it could distinguish that a VLAN should only be added for some subset of
>> addresses. Other hardware might not be so lucky..
>>
>> Unfortunately, this has the weird consequence that if we have the
>> following stack of devices:
>>
>> vlan10@macvlan0
>> macvlan0@ens0
>> ens0
>>
>> Then ens0 will receive VLAN10 traffic on every address. So VLAN 10 traffic
>> destined to the MAC of the lowerdev will be received, instead of dropped.
>>
>> If we add VLAN 10 to the lowerdev so we have both the above stack and also
>>
>> lowervlan10@ens0
>> ens0 (mac gg:hh:ii:jj:kk)
>>
>> then all vlan 10 traffic will be received on the lowerdev VLAN 10, without
>> any being forwarded to the VLAN10 attached to the macvlan.
>>
>> However, if we add two macvlans, and each add the vlan10, so we have the
>> following:
>>
>> avlan10@macvlan0
>> macvlan0@ens0
>> ens0
>>
>> bvlan10@macvlan1
>> macvlan1@ens0
>> ens0
>>
>> In this case, it does appear that traffic is sorted out correctly. It
>> seems that only if the lowerdev gets the VLAN does it end up breaking. If I
>> remove bvlan10 from macvlan1, the traffic associated with vlan10 is still
>> received by macvlan1, even though in principle it should no longer be.
>>
>> What is the correct behavior here? Should this just be "administrators
>> should know better"? I don't think that's a great argument, and either way
>> we're still essentially leaking VLANs across the macvlan interfaces, which I
>> don't think is ideal.
>>
>> I see two possible solutions:
>>
>> 1) modify macvlan driver so that it is marked as VLAN_CHALLENGED, and thus
>> indicate it cannot handle VLAN traffic on top of it.
>>a. In order to get the VLANs associated, administrator could instead
>> add the VLAN first, and then add the macvlan on top. This I think is a
>> better configuration.
>>b. that doesn't work in the offload case, unless/until we fix the VLAN
>> interface to forward the l2_dfwd_add_station() along with a vid.
>>c. this could appear as loss of functionality, since in some cases
>> these VLAN on top of macvlan work today (with the interesting caveats listed
>> above).
>>
>> 2) modify how VLANs interact with MAC addresses, so that the lowerdev can
>> explicitly be aware of which VLANs are tied to which address groups, in
>> order to allow for the explicit configuration of which MAC+VLAN pairs are
>> actually allowed.
>>a. this is a much more invasive change to driver interface, and more
>> difficult to get right
>>b. possibly other configurations of stacked devices might have a
>> similar problem, so we could solve more here? Or create more problems.. I'm
>> not really certain.
>>
>>
>> I think the correct solution is (1) but I wasn't sure what

[PATCH bpf-next v8 0/5] libbpf: add XDP binding support

2018-01-30 Thread Eric Leblond

Hello Daniel,

No problem with the delay in the answer. I'm doing far worse.

Here is an updated version:
- add if_link.h in uapi and remove the definition
- fix a commit message
- remove uapi from a include

Best Regards,
--
Eric


[PATCH bpf-next v8 5/5] samples/bpf: use bpf_set_link_xdp_fd

2018-01-30 Thread Eric Leblond
Use bpf_set_link_xdp_fd instead of set_link_xdp_fd to remove some
code duplication and benefit of netlink ext ack errors message.

Signed-off-by: Eric Leblond 
---
 samples/bpf/bpf_load.c  | 102 
 samples/bpf/bpf_load.h  |   2 +-
 samples/bpf/xdp1_user.c |   4 +-
 samples/bpf/xdp_redirect_cpu_user.c |   6 +--
 samples/bpf/xdp_redirect_map_user.c |   8 +--
 samples/bpf/xdp_redirect_user.c |   8 +--
 samples/bpf/xdp_router_ipv4_user.c  |  10 ++--
 samples/bpf/xdp_rxq_info_user.c |   4 +-
 samples/bpf/xdp_tx_iptunnel_user.c  |   6 +--
 9 files changed, 24 insertions(+), 126 deletions(-)

diff --git a/samples/bpf/bpf_load.c b/samples/bpf/bpf_load.c
index 242631aa4ea2..69806d74fa53 100644
--- a/samples/bpf/bpf_load.c
+++ b/samples/bpf/bpf_load.c
@@ -695,105 +695,3 @@ struct ksym *ksym_search(long key)
return &syms[0];
 }
 
-int set_link_xdp_fd(int ifindex, int fd, __u32 flags)
-{
-   struct sockaddr_nl sa;
-   int sock, seq = 0, len, ret = -1;
-   char buf[4096];
-   struct nlattr *nla, *nla_xdp;
-   struct {
-   struct nlmsghdr  nh;
-   struct ifinfomsg ifinfo;
-   char attrbuf[64];
-   } req;
-   struct nlmsghdr *nh;
-   struct nlmsgerr *err;
-
-   memset(&sa, 0, sizeof(sa));
-   sa.nl_family = AF_NETLINK;
-
-   sock = socket(AF_NETLINK, SOCK_RAW, NETLINK_ROUTE);
-   if (sock < 0) {
-   printf("open netlink socket: %s\n", strerror(errno));
-   return -1;
-   }
-
-   if (bind(sock, (struct sockaddr *)&sa, sizeof(sa)) < 0) {
-   printf("bind to netlink: %s\n", strerror(errno));
-   goto cleanup;
-   }
-
-   memset(&req, 0, sizeof(req));
-   req.nh.nlmsg_len = NLMSG_LENGTH(sizeof(struct ifinfomsg));
-   req.nh.nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK;
-   req.nh.nlmsg_type = RTM_SETLINK;
-   req.nh.nlmsg_pid = 0;
-   req.nh.nlmsg_seq = ++seq;
-   req.ifinfo.ifi_family = AF_UNSPEC;
-   req.ifinfo.ifi_index = ifindex;
-
-   /* started nested attribute for XDP */
-   nla = (struct nlattr *)(((char *)&req)
-   + NLMSG_ALIGN(req.nh.nlmsg_len));
-   nla->nla_type = NLA_F_NESTED | 43/*IFLA_XDP*/;
-   nla->nla_len = NLA_HDRLEN;
-
-   /* add XDP fd */
-   nla_xdp = (struct nlattr *)((char *)nla + nla->nla_len);
-   nla_xdp->nla_type = 1/*IFLA_XDP_FD*/;
-   nla_xdp->nla_len = NLA_HDRLEN + sizeof(int);
-   memcpy((char *)nla_xdp + NLA_HDRLEN, &fd, sizeof(fd));
-   nla->nla_len += nla_xdp->nla_len;
-
-   /* if user passed in any flags, add those too */
-   if (flags) {
-   nla_xdp = (struct nlattr *)((char *)nla + nla->nla_len);
-   nla_xdp->nla_type = 3/*IFLA_XDP_FLAGS*/;
-   nla_xdp->nla_len = NLA_HDRLEN + sizeof(flags);
-   memcpy((char *)nla_xdp + NLA_HDRLEN, &flags, sizeof(flags));
-   nla->nla_len += nla_xdp->nla_len;
-   }
-
-   req.nh.nlmsg_len += NLA_ALIGN(nla->nla_len);
-
-   if (send(sock, &req, req.nh.nlmsg_len, 0) < 0) {
-   printf("send to netlink: %s\n", strerror(errno));
-   goto cleanup;
-   }
-
-   len = recv(sock, buf, sizeof(buf), 0);
-   if (len < 0) {
-   printf("recv from netlink: %s\n", strerror(errno));
-   goto cleanup;
-   }
-
-   for (nh = (struct nlmsghdr *)buf; NLMSG_OK(nh, len);
-nh = NLMSG_NEXT(nh, len)) {
-   if (nh->nlmsg_pid != getpid()) {
-   printf("Wrong pid %d, expected %d\n",
-  nh->nlmsg_pid, getpid());
-   goto cleanup;
-   }
-   if (nh->nlmsg_seq != seq) {
-   printf("Wrong seq %d, expected %d\n",
-  nh->nlmsg_seq, seq);
-   goto cleanup;
-   }
-   switch (nh->nlmsg_type) {
-   case NLMSG_ERROR:
-   err = (struct nlmsgerr *)NLMSG_DATA(nh);
-   if (!err->error)
-   continue;
-   printf("nlmsg error %s\n", strerror(-err->error));
-   goto cleanup;
-   case NLMSG_DONE:
-   break;
-   }
-   }
-
-   ret = 0;
-
-cleanup:
-   close(sock);
-   return ret;
-}
diff --git a/samples/bpf/bpf_load.h b/samples/bpf/bpf_load.h
index 7d57a4248893..453c200b389b 100644
--- a/samples/bpf/bpf_load.h
+++ b/samples/bpf/bpf_load.h
@@ -61,5 +61,5 @@ struct ksym {
 
 int load_kallsyms(void);
 struct ksym *ksym_search(long key);
-int set_link_xdp_fd(int ifindex, int fd, __u32 flags);
+int bpf_set_link_xdp_fd(int ifindex, int fd, __u32 flags);
 #endif
diff --git a/samples/bpf/xdp1_user.c b/samples/bpf/xdp1_user.c
index fdaefe91801d..b901ee2b3336 10

[PATCH bpf-next v8 1/5] tools: add netlink.h and if_link.h in tools uapi

2018-01-30 Thread Eric Leblond
The headers are necessary for libbpf compilation on system with older
version of the headers.

Signed-off-by: Eric Leblond 
---
 tools/include/uapi/linux/if_link.h | 943 +
 tools/include/uapi/linux/netlink.h | 251 ++
 tools/lib/bpf/Makefile |   6 +
 3 files changed, 1200 insertions(+)
 create mode 100644 tools/include/uapi/linux/if_link.h
 create mode 100644 tools/include/uapi/linux/netlink.h

diff --git a/tools/include/uapi/linux/if_link.h 
b/tools/include/uapi/linux/if_link.h
new file mode 100644
index ..8616131e2c61
--- /dev/null
+++ b/tools/include/uapi/linux/if_link.h
@@ -0,0 +1,943 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+#ifndef _UAPI_LINUX_IF_LINK_H
+#define _UAPI_LINUX_IF_LINK_H
+
+#include 
+#include 
+
+/* This struct should be in sync with struct rtnl_link_stats64 */
+struct rtnl_link_stats {
+   __u32   rx_packets; /* total packets received   */
+   __u32   tx_packets; /* total packets transmitted*/
+   __u32   rx_bytes;   /* total bytes received */
+   __u32   tx_bytes;   /* total bytes transmitted  */
+   __u32   rx_errors;  /* bad packets received */
+   __u32   tx_errors;  /* packet transmit problems */
+   __u32   rx_dropped; /* no space in linux buffers*/
+   __u32   tx_dropped; /* no space available in linux  */
+   __u32   multicast;  /* multicast packets received   */
+   __u32   collisions;
+
+   /* detailed rx_errors: */
+   __u32   rx_length_errors;
+   __u32   rx_over_errors; /* receiver ring buff overflow  */
+   __u32   rx_crc_errors;  /* recved pkt with crc error*/
+   __u32   rx_frame_errors;/* recv'd frame alignment error */
+   __u32   rx_fifo_errors; /* recv'r fifo overrun  */
+   __u32   rx_missed_errors;   /* receiver missed packet   */
+
+   /* detailed tx_errors */
+   __u32   tx_aborted_errors;
+   __u32   tx_carrier_errors;
+   __u32   tx_fifo_errors;
+   __u32   tx_heartbeat_errors;
+   __u32   tx_window_errors;
+
+   /* for cslip etc */
+   __u32   rx_compressed;
+   __u32   tx_compressed;
+
+   __u32   rx_nohandler;   /* dropped, no handler found*/
+};
+
+/* The main device statistics structure */
+struct rtnl_link_stats64 {
+   __u64   rx_packets; /* total packets received   */
+   __u64   tx_packets; /* total packets transmitted*/
+   __u64   rx_bytes;   /* total bytes received */
+   __u64   tx_bytes;   /* total bytes transmitted  */
+   __u64   rx_errors;  /* bad packets received */
+   __u64   tx_errors;  /* packet transmit problems */
+   __u64   rx_dropped; /* no space in linux buffers*/
+   __u64   tx_dropped; /* no space available in linux  */
+   __u64   multicast;  /* multicast packets received   */
+   __u64   collisions;
+
+   /* detailed rx_errors: */
+   __u64   rx_length_errors;
+   __u64   rx_over_errors; /* receiver ring buff overflow  */
+   __u64   rx_crc_errors;  /* recved pkt with crc error*/
+   __u64   rx_frame_errors;/* recv'd frame alignment error */
+   __u64   rx_fifo_errors; /* recv'r fifo overrun  */
+   __u64   rx_missed_errors;   /* receiver missed packet   */
+
+   /* detailed tx_errors */
+   __u64   tx_aborted_errors;
+   __u64   tx_carrier_errors;
+   __u64   tx_fifo_errors;
+   __u64   tx_heartbeat_errors;
+   __u64   tx_window_errors;
+
+   /* for cslip etc */
+   __u64   rx_compressed;
+   __u64   tx_compressed;
+
+   __u64   rx_nohandler;   /* dropped, no handler found*/
+};
+
+/* The struct should be in sync with struct ifmap */
+struct rtnl_link_ifmap {
+   __u64   mem_start;
+   __u64   mem_end;
+   __u64   base_addr;
+   __u16   irq;
+   __u8dma;
+   __u8port;
+};
+
+/*
+ * IFLA_AF_SPEC
+ *   Contains nested attributes for address family specific attributes.
+ *   Each address family may create a attribute with the address family
+ *   number as type and create its own attribute structure in it.
+ *
+ *   Example:
+ *   [IFLA_AF_SPEC] = {
+ *   [AF_INET] = {
+ *   [IFLA_INET_CONF] = ...,
+ *   },
+ *   [AF_INET6] = {
+ *   [IFLA_INET6_FLAGS] = ...,
+ *   [IFLA_INET6_CONF] = ...,
+ *   }
+ *   }
+ */
+
+enum {
+   IFLA_UNSPEC,
+   IFLA_ADDRESS,
+   IFLA_BROADCAST,
+   IFLA_IFNAME,
+   IFLA_MTU,
+   IFLA_LINK,
+   IFLA_QDISC,
+   IFLA_STATS,
+   IFLA_COST,
+#define IFLA_COST IFLA_COST
+   IFLA_PRIORIT

[PATCH bpf-next v8 3/5] libbpf: add error reporting in XDP

2018-01-30 Thread Eric Leblond
Parse netlink ext attribute to get the error message returned by
the card. Code is partially take from libnl.

We add netlink.h to the uapi include of tools. And we need to
avoid include of userspace netlink header to have a successful
build of sample so nlattr.h has a define to avoid
the inclusion. Using a direct define could have been an issue
as NLMSGERR_ATTR_MAX can change in the future.

We also define SOL_NETLINK if not defined to avoid to have to
copy socket.h for a fixed value.

Signed-off-by: Eric Leblond 
Acked-by: Alexei Starovoitov 
---
 samples/bpf/Makefile   |   2 +-
 tools/lib/bpf/Build|   2 +-
 tools/lib/bpf/bpf.c|  11 +++
 tools/lib/bpf/nlattr.c | 187 +
 tools/lib/bpf/nlattr.h |  72 +++
 5 files changed, 272 insertions(+), 2 deletions(-)
 create mode 100644 tools/lib/bpf/nlattr.c
 create mode 100644 tools/lib/bpf/nlattr.h

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index 64335bb94f9f..ec3fc8d88e87 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -45,7 +45,7 @@ hostprogs-y += xdp_rxq_info
 hostprogs-y += syscall_tp
 
 # Libbpf dependencies
-LIBBPF := ../../tools/lib/bpf/bpf.o
+LIBBPF := ../../tools/lib/bpf/bpf.o ../../tools/lib/bpf/nlattr.o
 CGROUP_HELPERS := ../../tools/testing/selftests/bpf/cgroup_helpers.o
 
 test_lru_dist-objs := test_lru_dist.o $(LIBBPF)
diff --git a/tools/lib/bpf/Build b/tools/lib/bpf/Build
index d8749756352d..64c679d67109 100644
--- a/tools/lib/bpf/Build
+++ b/tools/lib/bpf/Build
@@ -1 +1 @@
-libbpf-y := libbpf.o bpf.o
+libbpf-y := libbpf.o bpf.o nlattr.o
diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index bf2772566240..9c88f6e4156d 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -32,6 +32,10 @@
 #include 
 #include 
 
+#ifndef SOL_NETLINK
+#define SOL_NETLINK 270
+#endif
+
 /*
  * When building perf, unistd.h is overridden. __NR_bpf is
  * required to be defined explicitly.
@@ -436,6 +440,7 @@ int bpf_set_link_xdp_fd(int ifindex, int fd, __u32 flags)
struct nlmsghdr *nh;
struct nlmsgerr *err;
socklen_t addrlen;
+   int one = 1;
 
memset(&sa, 0, sizeof(sa));
sa.nl_family = AF_NETLINK;
@@ -445,6 +450,11 @@ int bpf_set_link_xdp_fd(int ifindex, int fd, __u32 flags)
return -errno;
}
 
+   if (setsockopt(sock, SOL_NETLINK, NETLINK_EXT_ACK,
+  &one, sizeof(one)) < 0) {
+   fprintf(stderr, "Netlink error reporting not supported\n");
+   }
+
if (bind(sock, (struct sockaddr *)&sa, sizeof(sa)) < 0) {
ret = -errno;
goto cleanup;
@@ -521,6 +531,7 @@ int bpf_set_link_xdp_fd(int ifindex, int fd, __u32 flags)
if (!err->error)
continue;
ret = err->error;
+   nla_dump_errormsg(nh);
goto cleanup;
case NLMSG_DONE:
break;
diff --git a/tools/lib/bpf/nlattr.c b/tools/lib/bpf/nlattr.c
new file mode 100644
index ..4719434278b2
--- /dev/null
+++ b/tools/lib/bpf/nlattr.c
@@ -0,0 +1,187 @@
+// SPDX-License-Identifier: LGPL-2.1
+
+/*
+ * NETLINK  Netlink attributes
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation version 2.1
+ * of the License.
+ *
+ * Copyright (c) 2003-2013 Thomas Graf 
+ */
+
+#include 
+#include "nlattr.h"
+#include 
+#include 
+#include 
+
+static uint16_t nla_attr_minlen[NLA_TYPE_MAX+1] = {
+   [NLA_U8]= sizeof(uint8_t),
+   [NLA_U16]   = sizeof(uint16_t),
+   [NLA_U32]   = sizeof(uint32_t),
+   [NLA_U64]   = sizeof(uint64_t),
+   [NLA_STRING]= 1,
+   [NLA_FLAG]  = 0,
+};
+
+static int nla_len(const struct nlattr *nla)
+{
+   return nla->nla_len - NLA_HDRLEN;
+}
+
+static struct nlattr *nla_next(const struct nlattr *nla, int *remaining)
+{
+   int totlen = NLA_ALIGN(nla->nla_len);
+
+   *remaining -= totlen;
+   return (struct nlattr *) ((char *) nla + totlen);
+}
+
+static int nla_ok(const struct nlattr *nla, int remaining)
+{
+   return remaining >= sizeof(*nla) &&
+  nla->nla_len >= sizeof(*nla) &&
+  nla->nla_len <= remaining;
+}
+
+static void *nla_data(const struct nlattr *nla)
+{
+   return (char *) nla + NLA_HDRLEN;
+}
+
+static int nla_type(const struct nlattr *nla)
+{
+   return nla->nla_type & NLA_TYPE_MASK;
+}
+
+static int validate_nla(struct nlattr *nla, int maxtype,
+   struct nla_policy *policy)
+{
+   struct nla_policy *pt;
+   unsigned int minlen = 0;
+   int type = nla_type(nla);
+
+   if (type < 0 || type > maxtype)
+   return 0;
+
+   pt = &policy[type];
+
+   if (pt->type > NLA_TYPE_MAX)
+   

[PATCH bpf-next v8 4/5] libbpf: add missing SPDX-License-Identifier

2018-01-30 Thread Eric Leblond
Signed-off-by: Eric Leblond 
Acked-by: Alexei Starovoitov 
---
 tools/lib/bpf/bpf.c| 2 ++
 tools/lib/bpf/bpf.h| 2 ++
 tools/lib/bpf/libbpf.c | 2 ++
 tools/lib/bpf/libbpf.h | 2 ++
 4 files changed, 8 insertions(+)

diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index 9c88f6e4156d..592a58a2b681 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -1,3 +1,5 @@
+// SPDX-License-Identifier: LGPL-2.1
+
 /*
  * common eBPF ELF operations.
  *
diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
index 9f44c196931e..8d18fb73d7fb 100644
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -1,3 +1,5 @@
+/* SPDX-License-Identifier: LGPL-2.1 */
+
 /*
  * common eBPF ELF operations.
  *
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index c60122d3ea85..71ddc481f349 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -1,3 +1,5 @@
+// SPDX-License-Identifier: LGPL-2.1
+
 /*
  * Common eBPF ELF object loading operations.
  *
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index e42f96900318..f85906533cdd 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -1,3 +1,5 @@
+/* SPDX-License-Identifier: LGPL-2.1 */
+
 /*
  * Common eBPF ELF object loading operations.
  *
-- 
2.15.1



[PATCH bpf-next v8 2/5] libbpf: add function to setup XDP

2018-01-30 Thread Eric Leblond
Most of the code is taken from set_link_xdp_fd() in bpf_load.c and
slightly modified to be library compliant.

Signed-off-by: Eric Leblond 
Acked-by: Alexei Starovoitov 
---
 tools/lib/bpf/bpf.c| 122 +
 tools/lib/bpf/libbpf.c |   2 +
 tools/lib/bpf/libbpf.h |   4 ++
 3 files changed, 128 insertions(+)

diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index 5128677e4117..bf2772566240 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -25,6 +25,12 @@
 #include 
 #include 
 #include "bpf.h"
+#include "libbpf.h"
+#include "nlattr.h"
+#include 
+#include 
+#include 
+#include 
 
 /*
  * When building perf, unistd.h is overridden. __NR_bpf is
@@ -46,7 +52,9 @@
 # endif
 #endif
 
+#ifndef min
 #define min(x, y) ((x) < (y) ? (x) : (y))
+#endif
 
 static inline __u64 ptr_to_u64(const void *ptr)
 {
@@ -413,3 +421,117 @@ int bpf_obj_get_info_by_fd(int prog_fd, void *info, __u32 
*info_len)
 
return err;
 }
+
+int bpf_set_link_xdp_fd(int ifindex, int fd, __u32 flags)
+{
+   struct sockaddr_nl sa;
+   int sock, seq = 0, len, ret = -1;
+   char buf[4096];
+   struct nlattr *nla, *nla_xdp;
+   struct {
+   struct nlmsghdr  nh;
+   struct ifinfomsg ifinfo;
+   char attrbuf[64];
+   } req;
+   struct nlmsghdr *nh;
+   struct nlmsgerr *err;
+   socklen_t addrlen;
+
+   memset(&sa, 0, sizeof(sa));
+   sa.nl_family = AF_NETLINK;
+
+   sock = socket(AF_NETLINK, SOCK_RAW, NETLINK_ROUTE);
+   if (sock < 0) {
+   return -errno;
+   }
+
+   if (bind(sock, (struct sockaddr *)&sa, sizeof(sa)) < 0) {
+   ret = -errno;
+   goto cleanup;
+   }
+
+   addrlen = sizeof(sa);
+   if (getsockname(sock, (struct sockaddr *)&sa, &addrlen) < 0) {
+   ret = -errno;
+   goto cleanup;
+   }
+
+   if (addrlen != sizeof(sa)) {
+   ret = -LIBBPF_ERRNO__INTERNAL;
+   goto cleanup;
+   }
+
+   memset(&req, 0, sizeof(req));
+   req.nh.nlmsg_len = NLMSG_LENGTH(sizeof(struct ifinfomsg));
+   req.nh.nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK;
+   req.nh.nlmsg_type = RTM_SETLINK;
+   req.nh.nlmsg_pid = 0;
+   req.nh.nlmsg_seq = ++seq;
+   req.ifinfo.ifi_family = AF_UNSPEC;
+   req.ifinfo.ifi_index = ifindex;
+
+   /* started nested attribute for XDP */
+   nla = (struct nlattr *)(((char *)&req)
+   + NLMSG_ALIGN(req.nh.nlmsg_len));
+   nla->nla_type = NLA_F_NESTED | IFLA_XDP;
+   nla->nla_len = NLA_HDRLEN;
+
+   /* add XDP fd */
+   nla_xdp = (struct nlattr *)((char *)nla + nla->nla_len);
+   nla_xdp->nla_type = IFLA_XDP_FD;
+   nla_xdp->nla_len = NLA_HDRLEN + sizeof(int);
+   memcpy((char *)nla_xdp + NLA_HDRLEN, &fd, sizeof(fd));
+   nla->nla_len += nla_xdp->nla_len;
+
+   /* if user passed in any flags, add those too */
+   if (flags) {
+   nla_xdp = (struct nlattr *)((char *)nla + nla->nla_len);
+   nla_xdp->nla_type = IFLA_XDP_FLAGS;
+   nla_xdp->nla_len = NLA_HDRLEN + sizeof(flags);
+   memcpy((char *)nla_xdp + NLA_HDRLEN, &flags, sizeof(flags));
+   nla->nla_len += nla_xdp->nla_len;
+   }
+
+   req.nh.nlmsg_len += NLA_ALIGN(nla->nla_len);
+
+   if (send(sock, &req, req.nh.nlmsg_len, 0) < 0) {
+   ret = -errno;
+   goto cleanup;
+   }
+
+   len = recv(sock, buf, sizeof(buf), 0);
+   if (len < 0) {
+   ret = -errno;
+   goto cleanup;
+   }
+
+   for (nh = (struct nlmsghdr *)buf; NLMSG_OK(nh, len);
+nh = NLMSG_NEXT(nh, len)) {
+   if (nh->nlmsg_pid != sa.nl_pid) {
+   ret = -LIBBPF_ERRNO__WRNGPID;
+   goto cleanup;
+   }
+   if (nh->nlmsg_seq != seq) {
+   ret = -LIBBPF_ERRNO__INVSEQ;
+   goto cleanup;
+   }
+   switch (nh->nlmsg_type) {
+   case NLMSG_ERROR:
+   err = (struct nlmsgerr *)NLMSG_DATA(nh);
+   if (!err->error)
+   continue;
+   ret = err->error;
+   goto cleanup;
+   case NLMSG_DONE:
+   break;
+   default:
+   break;
+   }
+   }
+
+   ret = 0;
+
+cleanup:
+   close(sock);
+   return ret;
+}
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 30c776375118..c60122d3ea85 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -106,6 +106,8 @@ static const char *libbpf_strerror_table[NR_ERRNO] = {
[ERRCODE_OFFSET(PROG2BIG)]  = "Program too big",
[ERRCODE_OFFSET(KVER)]  = "Incorrect kernel version",
   

Re: net: hang in unregister_netdevice: waiting for lo to become free

2018-01-30 Thread David Ahern
On 1/30/18 1:08 PM, Daniel Borkmann wrote:
> On 01/30/2018 07:32 PM, Cong Wang wrote:
>> On Tue, Jan 30, 2018 at 4:09 AM, Dmitry Vyukov  wrote:
>>> Hello,
>>>
>>> The following program creates a hang in unregister_netdevice.
>>> cleanup_net work hangs there forever periodically printing
>>> "unregister_netdevice: waiting for lo to become free. Usage count = 3"
>>> and creation of any new network namespaces hangs forever.
>>
>> Interestingly, this is not reproducible on net-next.
> 
> The most recent change on netns refcnt was 4ee806d51176 ("net: tcp: close
> sock if net namespace is exiting") in net/net-next from 5 days ago, maybe
> fixed due to that?
> 

This appears to be the commit introducing the refcnt leak:

$ git bisect bad
dbc2b5e9a09e9a6664679a667ff81cff6e5f2641 is the first bad commit
commit dbc2b5e9a09e9a6664679a667ff81cff6e5f2641
Author: Xin Long 
Date:   Fri May 12 14:39:52 2017 +0800

sctp: fix src address selection if using secondary addresses for ipv6


v4.14 is bad. Running bisect in the background while doing other things


Re: sctp netns "unregister_netdevice: waiting for lo to become free. Usage count = 1"

2018-01-30 Thread Neil Horman
On Tue, Jan 30, 2018 at 09:24:17PM +0200, Tommi Rantala wrote:
> On 30.01.2018 17:59, Neil Horman wrote:
> > On Mon, Jan 29, 2018 at 05:55:45PM +0200, Tommi Rantala wrote:
> > > 
> > > ip netns add TEST
> > > ip netns exec TEST ip link set lo up
> > > ip link add dummy0 type dummy
> > > ip link add dummy1 type dummy
> > > ip link add dummy2 type dummy
> > > ip link set dev dummy0 netns TEST
> > > ip link set dev dummy1 netns TEST
> > > ip link set dev dummy2 netns TEST
> > > ip netns exec TEST ip addr add 192.168.1.1/24 dev dummy0
> > > ip netns exec TEST ip link set dummy0 up
> > > ip netns exec TEST ip addr add 192.168.1.2/24 dev dummy1
> > > ip netns exec TEST ip link set dummy1 up
> > > ip netns exec TEST ip addr add 192.168.1.3/24 dev dummy2
> > > ip netns exec TEST ip link set dummy2 up
> > > ip netns exec TEST sctp_test -H 192.168.1.2 -P 20002 -h 192.168.1.1 -p 
> > > 2
> > > -s -B 192.168.1.3
> > > ip netns del TEST
> > > 
> > Does the problem occur if you don't set lo up?
> 
> Still happens after dropping "ip netns exec TEST ip link set lo up".
> 
> Omitting "-B 192.168.1.3" from the sctp_test args helps.
> 
Thatswierd.  I'll look at the sctp_test code in the AM
Neil

> Tommi
> 


Re: general protection fault in __rds_rdma_map

2018-01-30 Thread Eric Biggers
On Mon, Nov 27, 2017 at 10:30:01AM -0800, syzbot wrote:
> Hello,
> 
> syzkaller hit the following crash on
> e1d1ea549b57790a3d8cf6300e6ef86118d692a3
> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/master
> compiler: gcc (GCC) 7.1.1 20170620
> .config is attached
> Raw console output is attached.
> C reproducer is attached
> syzkaller reproducer is attached. See https://goo.gl/kgGztJ
> for information about syzkaller reproducers
> 
> 
> kasan: GPF could be caused by NULL-ptr deref or user memory access
> general protection fault:  [#1] SMP KASAN
> RDS: rds_bind could not find a transport for 224.0.0.2, load rds_tcp or
> rds_rdma?
> Dumping ftrace buffer:
>(ftrace buffer empty)
> Modules linked in:
> CPU: 1 PID: 3078 Comm: syzkaller719569 Not tainted 4.14.0+ #189
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> Google 01/01/2011
> task: 8801cbbda580 task.stack: 8801cb8d
> RIP: 0010:__rds_rdma_map+0x133/0x1050 net/rds/rdma.c:191
> RSP: 0018:8801cb8d7a28 EFLAGS: 00010206
> RAX: dc00 RBX: 8801cb8d7bd0 RCX: 84c0b20d
> RDX: 0018 RSI: 8801cb8d7bd0 RDI: 00c0
> RBP: 8801cb8d7b90 R08: ed003971af96 R09: ed003971af96
> R10:  R11: ed003971af95 R12: 
> R13: 8801cb407480 R14:  R15: 8801cb407480
> FS:  7fb0be5a3700() GS:8801db50() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 7fb0be5a2e78 CR3: 0001cfc07000 CR4: 001406e0
> DR0:  DR1:  DR2: 
> DR3:  DR6: fffe0ff0 DR7: 0400
> Call Trace:
>  rds_get_mr_for_dest+0x1bb/0x290 net/rds/rdma.c:357
>  rds_setsockopt+0x6b9/0x970 net/rds/af_rds.c:347
>  SYSC_setsockopt net/socket.c:1851 [inline]
>  SyS_setsockopt+0x189/0x360 net/socket.c:1830
>  entry_SYSCALL_64_fastpath+0x1f/0x96
> RIP: 0033:0x44a789
> RSP: 002b:7fb0be5a2dc8 EFLAGS: 0202 ORIG_RAX: 0036
> RAX: ffda RBX:  RCX: 0044a789
> RDX: 0007 RSI: 4114 RDI: 0004
> RBP: 0086 R08: 00a0 R09: 7fb0be5a3700
> R10: 2ffc R11: 0202 R12: 
> R13: 007efe3f R14: 7fb0be5a39c0 R15: 
> Code: 57 0d 00 00 48 8b 85 f0 fe ff ff 4c 8b a0 c0 04 00 00 48 b8 00 00 00
> 00 00 fc ff df 49 8d bc 24 c0 00 00 00 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f
> 85 6a 0e 00 00 49 83 bc 24 c0 00 00 00 00 0f 84
> RIP: __rds_rdma_map+0x133/0x1050 net/rds/rdma.c:191 RSP: 8801cb8d7a28
> ---[ end trace 5e0e31770c7b70a7 ]---
> Kernel panic - not syncing: Fatal exception
> Dumping ftrace buffer:
>(ftrace buffer empty)
> Kernel Offset: disabled
> Rebooting in 86400 seconds..
> 
> 
> ---
> This bug is generated by a dumb bot. It may contain errors.
> See https://goo.gl/tpsmEJ for details.
> Direct all questions to syzkal...@googlegroups.com.
> Please credit me with: Reported-by: syzbot 
> 
> syzbot will keep track of this bug report.
> Once a fix for this bug is committed, please reply to this email with:
> #syz fix: exact-commit-title

Crash is no longer occurring, apparently was fixed by:

#syz fix: rds: Fix NULL pointer dereference in __rds_rdma_map


Re: WARNING in xfrm_state_fini

2018-01-30 Thread Eric Biggers
On Mon, Nov 27, 2017 at 09:37:07AM -0800, Cong Wang wrote:
> On Mon, Nov 27, 2017 at 3:55 AM, Steffen Klassert
>  wrote:
> > On Tue, Nov 21, 2017 at 06:44:04PM -0800, Cong Wang wrote:
> >> User-space uses proto==0 as a wildcard, but xfrm_id_proto_match()
> >> doesn't consider it as a match with IPSEC_PROTO_ANY, in this case
> >> it should match all. Not sure if the following patch is the best way to
> >> fix it, or perhaps x->id.proto should be initialized to some of these 3
> >> values, but looking into ->init_temprop() it is not the case.
> >
> > x->id is copied from the policy template and it seems that we don't
> > validate the id of the template when inserting the policy. iproute2
> > checks for a valid IPsec proto but the kernel does not do so. I think
> > we should check the policy template and reject inserting if the proto
> > is invalid.
> >
> 
> Oh, I thought 0 is used as wildcard, so it is not.
> 
> Something like below?
> 
> @@ -1445,6 +1446,15 @@ static int validate_tmpl(int nr, struct
> xfrm_user_tmpl *ut, u16 family)
> default:
> return -EINVAL;
> }
> +   switch (ut[i].id.proto) {
> +   case IPPROTO_AH:
> +   case IPPROTO_ESP:
> +   case IPPROTO_COMP:
> +   break;
> +   default:
> +   return -EINVAL;
> +   }
> +
> }
> 
> return 0;
> 

I assume this is supposed to be fixed by the following, so marking it closed for
syzbot:

#syz fix: xfrm: check id proto in validate_tmpl()

But syzbot has been hitting a WARN_ON() in xfrm_state_fini() even after that
fix, so it should get reported as a new bug.


Re: KASAN: stack-out-of-bounds Read in xfrm_state_find (3)

2018-01-30 Thread Eric Biggers
On Wed, Dec 13, 2017 at 06:18:05AM +0100, Steffen Klassert wrote:
> On Tue, Dec 12, 2017 at 01:00:31PM -0800, Eric Biggers wrote:
> > Hi Steffen,
> > 
> > On Fri, Dec 01, 2017 at 08:27:43AM +0100, Steffen Klassert wrote:
> > > On Wed, Nov 22, 2017 at 08:05:00AM -0800, syzbot wrote:
> > > > syzkaller has found reproducer for the following crash on
> > > > 0c86a6bd85ff0629cd2c5141027fc1c8bb6cde9c
> > > > git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/master
> > > > compiler: gcc (GCC) 7.1.1 20170620
> > > > .config is attached
> > > > Raw console output is attached.
> > > > C reproducer is attached
> > > > syzkaller reproducer is attached. See https://goo.gl/kgGztJ
> > > > for information about syzkaller reproducers
> > > > 
> > > > 
> > > > BUG: KASAN: stack-out-of-bounds in xfrm_state_find+0x30fc/0x3230
> > > > net/xfrm/xfrm_state.c:1051
> > > > Read of size 4 at addr 8801ccaa7af8 by task syzkaller231684/3045
> > > 
> > > The patch below should fix this. I plan to apply it to the ipsec tree
> > > after some advanced testing.
> > > 
> > > Subject: [PATCH RFC] xfrm: Fix stack-out-of-bounds with misconfigured 
> > > transport
> > >  mode policies.
> > > 
> > 
> > Are you still planning to apply this?  syzbot is still hitting this bug.
> 
> It is already applied to the ipsec tree, will go upstream by the end of
> this week.
>

Marking this fixed for syzbot:

#syz fix: xfrm: Fix stack-out-of-bounds with misconfigured transport mode 
policies.


Re: general protection fault in ___bpf_prog_run

2018-01-30 Thread Daniel Borkmann
On 01/30/2018 09:58 PM, syzbot wrote:
> Hello,
> 
> syzbot hit the following crash on bpf-next commit
> 868c36dcc949c26bc74fa4661b670d9acc6489e4 (Mon Jan 29 03:00:16 2018 +)
> Merge tag 'wireless-drivers-next-for-davem-2018-01-26' of 
> git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next

Thanks for the report, looking into it right now.


Re: [PATCH] vsock.7: document VSOCK socket address family

2018-01-30 Thread Michael Kerrisk (man-pages)
Hi Stefan,

Ping on the below please, since it either blocks the man-pages release
I'd currently like to make, or I must remove the vsock.7 page for this
release.

Thanks,

Michael



On 26 January 2018 at 22:47, Michael Kerrisk (man-pages)
 wrote:
> Stefan,
>
> I've just now noted that your page came with no license. What license
> do you want to use Please see
> https://www.kernel.org/doc/man-pages/licenses.html
>
> Thanks,
>
> Michael
>
>
> On 30 November 2017 at 12:21, Stefan Hajnoczi  wrote:
>> The AF_VSOCK address family has been available since Linux 3.9 without a
>> corresponding man page.
>>
>> This patch adds vsock.7 and describes its use along the same lines as
>> existing ip.7, unix.7, and netlink.7 man pages.
>>
>> CC: Jorgen Hansen 
>> CC: Dexuan Cui 
>> Signed-off-by: Stefan Hajnoczi 
>> ---
>>  man7/vsock.7 | 175 
>> +++
>>  1 file changed, 175 insertions(+)
>>  create mode 100644 man7/vsock.7
>>
>> diff --git a/man7/vsock.7 b/man7/vsock.7
>> new file mode 100644
>> index 0..48c6c2e1e
>> --- /dev/null
>> +++ b/man7/vsock.7
>> @@ -0,0 +1,175 @@
>> +.TH VSOCK 7 2017-11-30 "Linux" "Linux Programmer's Manual"
>> +.SH NAME
>> +vsock \- Linux VSOCK address family
>> +.SH SYNOPSIS
>> +.B #include 
>> +.br
>> +.B #include 
>> +.PP
>> +.IB stream_socket " = socket(AF_VSOCK, SOCK_STREAM, 0);"
>> +.br
>> +.IB datagram_socket " = socket(AF_VSOCK, SOCK_DGRAM, 0);"
>> +.SH DESCRIPTION
>> +The VSOCK address family facilitates communication between virtual machines 
>> and
>> +the host they are running on.  This address family is used by guest agents 
>> and
>> +hypervisor services that need a communications channel that is independent 
>> of
>> +virtual machine network configuration.
>> +.PP
>> +Valid socket types are
>> +.B SOCK_STREAM
>> +and
>> +.B SOCK_DGRAM .
>> +.B SOCK_STREAM
>> +provides connection-oriented byte streams with guaranteed, in-order 
>> delivery.
>> +.B SOCK_DGRAM
>> +provides a connectionless datagram packet service.  Availability of these
>> +socket types is dependent on the underlying hypervisor.
>> +.PP
>> +A new socket is created with
>> +.PP
>> +socket(AF_VSOCK, socket_type, 0);
>> +.PP
>> +When a process wants to establish a connection it calls
>> +.BR connect (2)
>> +with a given destination socket address.  The socket is automatically bound 
>> to
>> +a free port if unbound.
>> +.PP
>> +A process can listen for incoming connections by first binding to a socket 
>> address using
>> +.BR bind (2)
>> +and then calling
>> +.BR listen (2).
>> +.PP
>> +Data is transferred using the usual
>> +.BR send (2)
>> +and
>> +.BR recv (2)
>> +family of socket system calls.
>> +.SS Address format
>> +A socket address is defined as a combination of a 32-bit Context Identifier 
>> (CID) and a 32-bit port number.  The CID identifies the source or 
>> destination, which is either a virtual machine or the host.  The port number 
>> differentiates between multiple services running on a single machine.
>> +.PP
>> +.in +4n
>> +.EX
>> +struct sockaddr_vm {
>> +sa_family_t svm_family; /* address family: AF_VSOCK */
>> +unsigned short  svm_reserved1;
>> +unsigned intsvm_port;   /* port in native byte order */
>> +unsigned intsvm_cid;/* address in native byte order */
>> +};
>> +.EE
>> +.in
>> +.PP
>> +.I svm_family
>> +is always set to
>> +.BR AF_VSOCK .
>> +.I svm_reserved1
>> +is always set to 0.
>> +.I svm_port
>> +contains the port in native byte order.
>> +The port numbers below 1024 are called
>> +.IR "privileged ports" .
>> +Only a process with
>> +.B CAP_NET_BIND_SERVER
>> +capability may
>> +.BR bind (2)
>> +to these port numbers.
>> +.PP
>> +There are several special addresses:
>> +.B VMADDR_CID_ANY
>> +(-1U)
>> +means any address for binding;
>> +.B VMADDR_CID_HYPERVISOR
>> +(0) and
>> +.B VMADDR_CID_RESERVED
>> +(1) are unused addresses;
>> +.B VMADDR_CID_HOST
>> +(2)
>> +is the well-known address of the host.
>> +.PP
>> +The special constant
>> +.B VMADDR_PORT_ANY
>> +(-1U)
>> +means any port number for binding.
>> +.SS Live migration
>> +Sockets are affected by live migration of virtual machines.  Connected
>> +.B SOCK_STREAM
>> +sockets become disconnected when the virtual machine migrates to a new host.
>> +Applications must reconnect when this happens.
>> +.PP
>> +The local CID may change across live migration if the old CID is not 
>> available
>> +on the new host.  Bound sockets are automatically updated to the new CID.
>> +.SS Ioctls
>> +.TP
>> +.B IOCTL_VM_SOCKETS_GET_LOCAL_CID
>> +Get the CID of the local machine.  The argument is a pointer to an unsigned 
>> int.
>> +.IP
>> +.in +4n
>> +.EX
>> +.IB error " = ioctl(" socket ", " IOCTL_VM_SOCKETS_GET_LOCAL_CID ", " &cid 
>> ");"
>> +.EE
>> +.in
>> +.IP
>> +Consider using
>> +.B VMADDR_CID_ANY
>> +when binding instead of getting the local CID with
>> +.B IOCTL_VM_SOCKETS_GET_LOCAL_CID .
>> +.SH ERRORS
>> +.TP
>> +.B EACCES

[git pull] reducing kernel_recvmsg() use

2018-01-30 Thread Al Viro
kernel_recvmsg() is a set_fs()-using wrapper for
sock_recvmsg().  In all but one case that is not needed -
use of ITER_KVEC for ->msg_iter takes care of the data
and does not care about set_fs().  The only exception is
svc_udp_recvfrom() where we want cmsg to be store into
kernel object; everything else can just use sock_recvmsg()
and be done with that.

A followup converting svc_udp_recvfrom() away
from set_fs() (and killing kernel_recvmsg() off) is *NOT*
in that one - I'd like to hear what netdev folks think
of the approach proposed in that followup.

The following changes since commit 4fbd8d194f06c8a3fd2af1ce560ddb31f7ec8323:

  Linux 4.15-rc1 (2017-11-26 16:01:47 -0800)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git work.sock_recvmsg

for you to fetch changes up to bc4802736d8b17eddde52e00838c348770f67c19:

  tipc: switch to sock_recvmsg() (2017-12-02 20:38:10 -0500)


Al Viro (10):
  svc_recvfrom(): switch to sock_recvmsg()
  dlm: switch to sock_recvmsg()
  ncpfs: switch to sock_recvmsg()
  cfs2: switch to sock_recvmsg()
  lustre lnet_sock_read(): switch to sock_recvmsg()
  drbd: switch to sock_recvmsg()
  mISDN: switch to sock_recvmsg()
  ipvs: switch to sock_recvmsg()
  smc: switch to sock_recvmsg()
  tipc: switch to sock_recvmsg()

 drivers/block/drbd/drbd_main.c|  8 +---
 drivers/block/drbd/drbd_receiver.c|  3 ++-
 drivers/isdn/mISDN/l1oip_core.c   | 22 +-
 drivers/staging/lustre/lnet/lnet/lib-socket.c | 24 +++-
 fs/dlm/lowcomms.c |  4 ++--
 fs/ncpfs/sock.c   |  3 ++-
 fs/ocfs2/cluster/tcp.c|  3 ++-
 net/netfilter/ipvs/ip_vs_sync.c   |  9 +++--
 net/smc/smc_clc.c | 18 ++
 net/sunrpc/svcsock.c  |  4 ++--
 net/tipc/server.c |  4 ++--
 11 files changed, 46 insertions(+), 56 deletions(-)


Re: BUG: unable to handle kernel NULL pointer dereference in addrconf_ifdown

2018-01-30 Thread Eric Biggers
On Tue, Dec 19, 2017 at 11:50:01PM -0800, syzbot wrote:
> Hello,
> 
> syzkaller hit the following crash on
> 6084b576dca2e898f5c101baef151f7bfdbb606d
> git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/master
> compiler: gcc (GCC) 7.1.1 20170620
> .config is attached
> Raw console output is attached.
> 
> Unfortunately, I don't have any reproducer for this bug yet.
> 
> 
> BUG: unable to handle kernel NULL pointer dereference at   (null)
> IP: addrconf_ifdown+0x3a2/0x780 net/ipv6/addrconf.c:3674
> PGD 1df99c067 P4D 1df99c067 PUD 1dfbbe067 PMD 0
> Oops:  [#1] SMP
> Dumping ftrace buffer:
>(ftrace buffer empty)
> Modules linked in:
> CPU: 0 PID: 28113 Comm: syz-executor0 Not tainted 4.15.0-rc3-next-20171214+
> #67
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> Google 01/01/2011
> RIP: 0010:addrconf_ifdown+0x3a2/0x780 net/ipv6/addrconf.c:3674
> RSP: 0018:c900014bf928 EFLAGS: 00010293
> RAX:  RBX:  RCX: 82251944
> RDX: 880216be6808 RSI: 880216be69b8 RDI: 0001
> RBP: c900014bf980 R08:  R09: 
> R10:  R11:  R12: 880216be6800
> R13: 880216be6968 R14:  R15: 831755a0
> FS:  7f91d886d700() GS:88021fc0() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2:  CR3: 0001df119000 CR4: 001426f0
> DR0: 2000 DR1:  DR2: 
> DR3:  DR6: fffe0ff0 DR7: 0600
> Call Trace:
>  addrconf_notify+0x86/0xd50 net/ipv6/addrconf.c:3514
>  notifier_call_chain+0x41/0xc0 kernel/notifier.c:93
>  __raw_notifier_call_chain kernel/notifier.c:394 [inline]
>  raw_notifier_call_chain+0x2d/0x40 kernel/notifier.c:401
>  call_netdevice_notifiers_info+0x32/0x60 net/core/dev.c:1679
>  call_netdevice_notifiers net/core/dev.c:1697 [inline]
>  dev_close_many+0x121/0x190 net/core/dev.c:1492
>  rollback_registered_many+0x16f/0x490 net/core/dev.c:7265
>  rollback_registered+0x59/0x90 net/core/dev.c:7330
>  unregister_netdevice_queue+0xa5/0xf0 net/core/dev.c:8311
>  unregister_netdevice include/linux/netdevice.h:2464 [inline]
>  __tun_detach+0x618/0x680 drivers/net/tun.c:688
>  tun_detach drivers/net/tun.c:699 [inline]
>  tun_chr_close+0x26/0x30 drivers/net/tun.c:2955
>  __fput+0x120/0x270 fs/file_table.c:209
>  fput+0x15/0x20 fs/file_table.c:243
>  task_work_run+0xa3/0xe0 kernel/task_work.c:113
>  exit_task_work include/linux/task_work.h:22 [inline]
>  do_exit+0x3e6/0x1050 kernel/exit.c:869
>  do_group_exit+0x60/0x100 kernel/exit.c:972
>  get_signal+0x36c/0xad0 kernel/signal.c:2337
>  do_signal+0x23/0x670 arch/x86/kernel/signal.c:809
>  exit_to_usermode_loop+0x13c/0x160 arch/x86/entry/common.c:161
>  prepare_exit_to_usermode arch/x86/entry/common.c:195 [inline]
>  syscall_return_slowpath+0x1b4/0x1e0 arch/x86/entry/common.c:264
>  entry_SYSCALL_64_fastpath+0x94/0x96
> RIP: 0033:0x452a39
> RSP: 002b:7f91d886cc58 EFLAGS: 0212 ORIG_RAX: 0010
> RAX:  RBX: 007580d8 RCX: 00452a39
> RDX:  RSI: 4c06 RDI: 001c
> RBP: 02dc R08:  R09: 
> R10:  R11: 0212 R12: 006f2540
> R13:  R14: 7f91d886d6d4 R15: 0012
> Code: 17 e8 13 8a 06 ff 41 8b b4 24 50 02 00 00 31 c0 85 f6 0f 94 c0 89 45
> d0 e8 fc 89 06 ff 49 8b 44 24 08 49 8d 54 24 08 48 89 55 b8 <48> 8b 08 48 39
> d0 48 8d 98 c8 fe ff ff 48 89 45 c0 4c 8d b1 c8
> RIP: addrconf_ifdown+0x3a2/0x780 net/ipv6/addrconf.c:3674 RSP:
> c900014bf928
> CR2: 
> ---[ end trace 36c3336430e0eeeb ]---

Invalidating this bug since it hasn't been seen again, and it was reported while
KASAN was accidentally disabled in the syzbot kconfig due to a change to the
kconfig menus in linux-next (so this crash was possibly caused by slab
corruption elsewhere).

#syz invalid


Re: BUG: unable to handle kernel NULL pointer dereference in sctp_cmp_addr_exact

2018-01-30 Thread Eric Biggers
On Tue, Dec 19, 2017 at 11:49:03PM -0800, syzbot wrote:
> Hello,
> 
> syzkaller hit the following crash on
> 6084b576dca2e898f5c101baef151f7bfdbb606d
> git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/master
> compiler: gcc (GCC) 7.1.1 20170620
> .config is attached
> Raw console output is attached.
> 
> Unfortunately, I don't have any reproducer for this bug yet.
> 
> 
> binder: 23647:23660 DecRefs 0 refcount change on invalid ref 4 ret -22
> binder: 23647:23660 BC_CLEAR_DEATH_NOTIFICATION invalid ref 0
> binder: 23647:23660 BC_REQUEST_DEATH_NOTIFICATION invalid ref 3
> binder: 23647:23660 got reply transaction with no transaction stack
> binder: 23647:23660 transaction failed 29201/-71, size 24-16 line 2747
> BUG: unable to handle kernel NULL pointer dereference at 0078
> IP: sctp_cmp_addr_exact+0x14/0x60 net/sctp/associola.c:911
> PGD 1dde2b067 P4D 1dde2b067 PUD 1ddf17067 PMD 0
> Oops:  [#1] SMP
> Dumping ftrace buffer:
>(ftrace buffer empty)
> Modules linked in:
> CPU: 0 PID: 23653 Comm: syz-executor1 Not tainted 4.15.0-rc3-next-20171214+
> #67
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> Google 01/01/2011
> RIP: 0010:sctp_cmp_addr_exact+0x14/0x60 net/sctp/associola.c:911
> RSP: 0018:c9da7b38 EFLAGS: 00010216
> RAX: 0001 RBX: fff0 RCX: 823e3464
> RDX: 0731 RSI: c90003199000 RDI: 0078
> RBP: c9da7b50 R08: 0001 R09: 0002
> R10: c9da7b18 R11: 0002 R12: 0078
> R13: 8801d9231488 R14: c9da7bc8 R15: 831e6c20
> FS:  7f601f498700() GS:88021fc0() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 0078 CR3: 0001d9071000 CR4: 001426f0
> DR0: 2008 DR1:  DR2: 
> DR3:  DR6: fffe0ff0 DR7: 0600
> Call Trace:
>  sctp_hash_cmp+0x2b/0xb0 net/sctp/input.c:807
>  __rhashtable_lookup include/linux/rhashtable.h:633 [inline]
>  rhltable_lookup include/linux/rhashtable.h:716 [inline]
>  sctp_hash_transport+0x179/0xb00 net/sctp/input.c:890
>  sctp_assoc_add_peer+0x31d/0x450 net/sctp/associola.c:718
>  sctp_sendmsg+0xd59/0x14d0 net/sctp/socket.c:1921
>  inet_sendmsg+0x54/0x250 net/ipv4/af_inet.c:763
>  sock_sendmsg_nosec net/socket.c:636 [inline]
>  sock_sendmsg+0x51/0x70 net/socket.c:646
>  SYSC_sendto+0x17f/0x1d0 net/socket.c:1727
>  SyS_sendto+0x40/0x50 net/socket.c:1695
>  entry_SYSCALL_64_fastpath+0x1f/0x96
> RIP: 0033:0x452a39
> RSP: 002b:7f601f497c58 EFLAGS: 0212 ORIG_RAX: 002c
> RAX: ffda RBX: 007580d8 RCX: 00452a39
> RDX: 0001 RSI: 20aaff09 RDI: 001a
> RBP: 03a1 R08: 2030bfe4 R09: 001c
> R10:  R11: 0212 R12: 006f37b8
> R13:  R14: 7f601f4986d4 R15: 0002
> Code: 00 01 8d 50 01 89 93 34 06 00 00 5b 5d c3 66 0f 1f 84 00 00 00 00 00
> 55 48 89 e5 41 55 41 54 53 49 89 fc 49 89 f5 e8 dc 6e ed fe <41> 0f b7 3c 24
> e8 92 e3 ff ff 48 85 c0 74 21 48 89 c3 e8 c5 6e
> RIP: sctp_cmp_addr_exact+0x14/0x60 net/sctp/associola.c:911 RSP:
> c9da7b38
> CR2: 0078
> ---[ end trace 436f7126566693ea ]---

Invalidating this bug since it hasn't been seen again, and it was reported while
KASAN was accidentally disabled in the syzbot kconfig due to a change to the
kconfig menus in linux-next (so this crash was possibly caused by slab
corruption elsewhere).

#syz invalid


Re: BUG: unable to handle kernel NULL pointer dereference in addrconf_notify

2018-01-30 Thread Eric Biggers
On Tue, Dec 19, 2017 at 11:48:01AM -0800, syzbot wrote:
> Hello,
> 
> syzkaller hit the following crash on
> 6084b576dca2e898f5c101baef151f7bfdbb606d
> git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/master
> compiler: gcc (GCC) 7.1.1 20170620
> .config is attached
> Raw console output is attached.
> 
> Unfortunately, I don't have any reproducer for this bug yet.
> 
> 
> ALSA: seq fatal error: cannot create timer (-22)
> device syz2 entered promiscuous mode
> BUG: unable to handle kernel NULL pointer dereference at   (null)
> IP: addrconf_permanent_addr net/ipv6/addrconf.c:3351 [inline]
> IP: addrconf_notify+0x7df/0xd50 net/ipv6/addrconf.c:3422
> PGD 1fad11067 P4D 1fad11067 PUD 1fad7a067 PMD 0
> Oops:  [#1] SMP
> Dumping ftrace buffer:
>(ftrace buffer empty)
> Modules linked in:
> CPU: 0 PID: 15171 Comm: syz-executor2 Not tainted 4.15.0-rc3-next-20171214+
> #67
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> Google 01/01/2011
> RIP: 0010:addrconf_permanent_addr net/ipv6/addrconf.c:3351 [inline]
> RIP: 0010:addrconf_notify+0x7df/0xd50 net/ipv6/addrconf.c:3422
> RSP: 0018:c9d63b80 EFLAGS: 00010282
> RAX:  RBX: 8801fa41a000 RCX: 8801faddd808
> RDX: 00ff RSI: 2811d9a2 RDI: 8801faddd968
> RBP: c9d63c00 R08: 0001 R09: 0004
> R10: c9d63af0 R11: 0004 R12: 0001
> R13: 8801faddd800 R14:  R15: 8801faddd800
> FS:  7f4373896700() GS:88021fc0() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2:  CR3: 0002143c4001 CR4: 001606f0
> DR0: 2008 DR1: 2000 DR2: 2000
> DR3:  DR6: fffe0ff0 DR7: 0600
> Call Trace:
>  notifier_call_chain+0x41/0xc0 kernel/notifier.c:93
>  __raw_notifier_call_chain kernel/notifier.c:394 [inline]
>  raw_notifier_call_chain+0x2d/0x40 kernel/notifier.c:401
>  call_netdevice_notifiers_info+0x32/0x60 net/core/dev.c:1679
>  call_netdevice_notifiers net/core/dev.c:1697 [inline]
>  __dev_notify_flags+0x75/0x130 net/core/dev.c:6874
>  dev_change_flags+0x5e/0x70 net/core/dev.c:6910
>  devinet_ioctl+0x77b/0x930 net/ipv4/devinet.c:1083
>  inet_ioctl+0xda/0x130 net/ipv4/af_inet.c:902
>  sock_do_ioctl+0x2c/0x60 net/socket.c:964
>  sock_ioctl+0x211/0x320 net/socket.c:1061
>  vfs_ioctl fs/ioctl.c:46 [inline]
>  do_vfs_ioctl+0xaf/0x840 fs/ioctl.c:686
>  SYSC_ioctl fs/ioctl.c:701 [inline]
>  SyS_ioctl+0x8f/0xc0 fs/ioctl.c:692
>  entry_SYSCALL_64_fastpath+0x1f/0x96
> RIP: 0033:0x452a09
> RSP: 002b:7f4373895c58 EFLAGS: 0212 ORIG_RAX: 0010
> RAX: ffda RBX: 0071bea0 RCX: 00452a09
> RDX: 2062ffe0 RSI: 8914 RDI: 0013
> RBP: 02eb R08:  R09: 
> R10:  R11: 0212 R12: 006f16a8
> R13:  R14: 7f43738966d4 R15: 
> Code: 85 ed 0f 84 f4 00 00 00 e8 ff 3a 06 ff 49 8d 85 68 01 00 00 48 89 c7
> 48 89 45 a8 e8 0c 9f 37 00 49 8b 45 08 4c 89 e9 48 83 c1 08 <48> 8b 38 48 39
> c1 4c 8d a8 c8 fe ff ff 4c 8d a7 c8 fe ff ff 0f
> RIP: addrconf_permanent_addr net/ipv6/addrconf.c:3351 [inline] RSP:
> c9d63b80
> RIP: addrconf_notify+0x7df/0xd50 net/ipv6/addrconf.c:3422 RSP:
> c9d63b80
> CR2: 
> ---[ end trace ebbf2e3971cab1b5 ]---

Invalidating this bug since it hasn't been seen again, and it was reported while
KASAN was accidentally disabled in the syzbot kconfig due to a change to the
kconfig menus in linux-next (so this crash was possibly caused by slab
corruption elsewhere).

#syz invalid


Re: BUG: unable to handle kernel NULL pointer dereference in neigh_fill_info

2018-01-30 Thread Eric Biggers
On Tue, Dec 19, 2017 at 10:41:00AM -0800, syzbot wrote:
> Hello,
> 
> syzkaller hit the following crash on
> 6084b576dca2e898f5c101baef151f7bfdbb606d
> git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/master
> compiler: gcc (GCC) 7.1.1 20170620
> .config is attached
> Raw console output is attached.
> 
> Unfortunately, I don't have any reproducer for this bug yet.
> 
> 
> netlink: 3 bytes leftover after parsing attributes in process
> `syz-executor5'.
> netlink: 3 bytes leftover after parsing attributes in process
> `syz-executor5'.
> BUG: unable to handle kernel NULL pointer dereference at 0008
> sock: process `syz-executor4' is using obsolete getsockopt SO_BSDCOMPAT
> IP: neigh_fill_info+0xe6/0x460 net/core/neighbour.c:2218
> PGD 1dece0067 P4D 1dece0067 PUD 1dc5b9067 PMD 0
> Oops:  [#1] SMP
> Dumping ftrace buffer:
>(ftrace buffer empty)
> Modules linked in:
> CPU: 0 PID: 25561 Comm: syz-executor5 Not tainted 4.15.0-rc3-next-20171214+
> #67
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> Google 01/01/2011
> RIP: 0010:neigh_fill_info+0xe6/0x460 net/core/neighbour.c:2218
> RSP: 0018:c9eaf9d0 EFLAGS: 00010246
> RAX:  RBX: 88021568b000 RCX: 88021568b288
> RDX:  RSI: 0001 RDI: 880215753800
> RBP: c9eafa40 R08: 000c R09: 88020ff500b4
> R10: c9eaf9d8 R11: 0002 R12: 88020ff50098
> R13: 93f7 R14: 001c R15: 880215753800
> FS:  7f91a7086700() GS:88021fc0() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 0008 CR3: 0001df751000 CR4: 001426f0
> DR0: 2008 DR1:  DR2: 
> DR3:  DR6: fffe0ff0 DR7: 00030602
> Call Trace:
>  neigh_dump_table net/core/neighbour.c:2350 [inline]
>  neigh_dump_info+0x532/0x9d0 net/core/neighbour.c:2438
>  netlink_dump+0x14e/0x360 net/netlink/af_netlink.c:
>  __netlink_dump_start+0x1bb/0x210 net/netlink/af_netlink.c:2319
>  netlink_dump_start include/linux/netlink.h:214 [inline]
>  rtnetlink_rcv_msg+0x44f/0x5d0 net/core/rtnetlink.c:4485
>  netlink_rcv_skb+0x92/0x160 net/netlink/af_netlink.c:2441
>  rtnetlink_rcv+0x1c/0x20 net/core/rtnetlink.c:4540
>  netlink_unicast_kernel net/netlink/af_netlink.c:1308 [inline]
>  netlink_unicast+0x1d4/0x290 net/netlink/af_netlink.c:1334
>  netlink_sendmsg+0x345/0x470 net/netlink/af_netlink.c:1897
>  sock_sendmsg_nosec net/socket.c:636 [inline]
>  sock_sendmsg+0x51/0x70 net/socket.c:646
>  sock_write_iter+0xa4/0x100 net/socket.c:915
>  call_write_iter include/linux/fs.h:1776 [inline]
>  new_sync_write fs/read_write.c:469 [inline]
>  __vfs_write+0x15b/0x1e0 fs/read_write.c:482
>  vfs_write+0xf0/0x230 fs/read_write.c:544
>  SYSC_write fs/read_write.c:589 [inline]
>  SyS_write+0x57/0xd0 fs/read_write.c:581
>  entry_SYSCALL_64_fastpath+0x1f/0x96
> RIP: 0033:0x452a39
> RSP: 002b:7f91a7085c58 EFLAGS: 0212 ORIG_RAX: 0001
> RAX: ffda RBX: 7f91a7086700 RCX: 00452a39
> RDX: 001f RSI: 2083a000 RDI: 0017
> RBP:  R08:  R09: 
> R10:  R11: 0212 R12: 
> R13: 00a6f7ff R14: 7f91a70869c0 R15: 
> Code: 14 01 00 00 41 88 44 24 1a 0f b6 83 16 01 00 00 41 88 44 24 1b 48 8b
> 83 80 02 00 00 8b 80 08 01 00 00 41 89 44 24 14 48 8b 43 08 <8b> 50 08 e8 c2
> e3 68 ff 85 c0 0f 85 5d 02 00 00 4c 8d 73 28 e8
> RIP: neigh_fill_info+0xe6/0x460 net/core/neighbour.c:2218 RSP:
> c9eaf9d0
> CR2: 0008
> ---[ end trace fd0910621fa1f590 ]---

Invalidating this bug since it hasn't been seen again, and it was reported while
KASAN was accidentally disabled in the syzbot kconfig due to a change to the
kconfig menus in linux-next (so this crash was possibly caused by slab
corruption elsewhere).

#syz invalid


Re: BUG: unable to handle kernel NULL pointer dereference in ipv6_get_lladdr

2018-01-30 Thread Eric Biggers
On Tue, Dec 19, 2017 at 08:38:01AM -0800, syzbot wrote:
> Hello,
> 
> syzkaller hit the following crash on
> 6084b576dca2e898f5c101baef151f7bfdbb606d
> git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/master
> compiler: gcc (GCC) 7.1.1 20170620
> .config is attached
> Raw console output is attached.
> 
> Unfortunately, I don't have any reproducer for this bug yet.
> 
> 
> device lo entered promiscuous mode
> device lo left promiscuous mode
> BUG: unable to handle kernel NULL pointer dereference at 0328
> IP: __read_once_size include/linux/compiler.h:183 [inline]
> IP: __in6_dev_get include/net/addrconf.h:300 [inline]
> IP: ipv6_get_lladdr+0x6f/0x270 net/ipv6/addrconf.c:1813
> PGD 0 P4D 0
> Oops:  [#1] SMP
> Dumping ftrace buffer:
>(ftrace buffer empty)
> Modules linked in:
> CPU: 1 PID: 16588 Comm: syz-executor1 Not tainted 4.15.0-rc3-next-20171214+
> #67
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> Google 01/01/2011
> RIP: 0010:__read_once_size include/linux/compiler.h:183 [inline]
> RIP: 0010:__in6_dev_get include/net/addrconf.h:300 [inline]
> RIP: 0010:ipv6_get_lladdr+0x6f/0x270 net/ipv6/addrconf.c:1813
> RSP: 0018:88021fd03da8 EFLAGS: 00010206
> RAX: 8801fcb0a280 RBX:  RCX: 8225359f
> RDX: 0100 RSI: aa440f11 RDI: 0286
> RBP: 88021fd03de0 R08: 0001 R09: 0002
> R10: 88021fd03d30 R11: 0002 R12: 8802156f2000
> R13: 0040 R14:  R15: 
> FS:  () GS:88021fd0() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 0328 CR3: 0301e002 CR4: 001606e0
> Call Trace:
>  
>  addrconf_rs_timer+0x128/0x230 net/ipv6/addrconf.c:3762
>  call_timer_fn+0x9f/0x3f0 kernel/time/timer.c:1320
>  expire_timers kernel/time/timer.c:1357 [inline]
>  __run_timers kernel/time/timer.c:1660 [inline]
>  run_timer_softirq+0x68a/0x830 kernel/time/timer.c:1686
>  __do_softirq+0xcb/0x4f3 kernel/softirq.c:285
>  invoke_softirq kernel/softirq.c:365 [inline]
>  irq_exit+0xd4/0xe0 kernel/softirq.c:405
>  exiting_irq arch/x86/include/asm/apic.h:540 [inline]
>  smp_apic_timer_interrupt+0x8e/0x2a0 arch/x86/kernel/apic/apic.c:1052
>  apic_timer_interrupt+0xa9/0xb0 arch/x86/entry/entry_64.S:920
>  
> RIP: 0010:lock_acquire+0x21/0x220 kernel/locking/lockdep.c:3906
> RSP: 0018:c9ee79f8 EFLAGS: 0292 ORIG_RAX: ff11
> RAX: 8801fcb0a280 RBX: ea00084e2340 RCX: 0002
> RDX:  RSI:  RDI: 83080740
> RBP: c9ee7a38 R08:  R09: 
> R10:  R11: 8801fcb0a280 R12: ea00084e2340
> R13: ea00084e2340 R14:  R15: 
>  rcu_lock_acquire include/linux/rcupdate.h:244 [inline]
>  rcu_read_lock include/linux/rcupdate.h:630 [inline]
>  lock_page_memcg+0x33/0xf0 mm/memcontrol.c:1620
>  page_remove_file_rmap mm/rmap.c:1213 [inline]
>  page_remove_rmap+0x12e/0x4b0 mm/rmap.c:1298
>  zap_pte_range mm/memory.c:1337 [inline]
>  zap_pmd_range mm/memory.c:1441 [inline]
>  zap_pud_range mm/memory.c:1470 [inline]
>  zap_p4d_range mm/memory.c:1491 [inline]
>  unmap_page_range+0x86f/0xed0 mm/memory.c:1512
>  unmap_single_vma+0xbb/0x140 mm/memory.c:1557
>  unmap_vmas+0x65/0xd0 mm/memory.c:1587
>  exit_mmap+0xb0/0x1e0 mm/mmap.c:3020
>  __mmput kernel/fork.c:966 [inline]
>  mmput+0x82/0x190 kernel/fork.c:987
>  exit_mm kernel/exit.c:544 [inline]
>  do_exit+0x356/0x1050 kernel/exit.c:856
>  do_group_exit+0x60/0x100 kernel/exit.c:972
>  get_signal+0x36c/0xad0 kernel/signal.c:2337
>  do_signal+0x23/0x670 arch/x86/kernel/signal.c:809
>  exit_to_usermode_loop+0x13c/0x160 arch/x86/entry/common.c:161
>  prepare_exit_to_usermode arch/x86/entry/common.c:195 [inline]
>  syscall_return_slowpath+0x1b4/0x1e0 arch/x86/entry/common.c:264
>  entry_SYSCALL_64_fastpath+0x94/0x96
> RIP: 0033:0x452a09
> RSP: 002b:7f96787f3ce8 EFLAGS: 0246 ORIG_RAX: 00ca
> RAX: fe00 RBX: 0071c0f0 RCX: 00452a09
> RDX:  RSI:  RDI: 0071c0f0
> RBP: 0071c0f0 R08: 0422 R09: 0071c0c8
> R10:  R11: 0246 R12: 
> R13: 00a2f7ff R14: 7f96787f49c0 R15: 0006
> Code: c7 40 07 08 83 e8 b2 2e fd fe e8 3d 9e ff fe 85 c0 5a 74 12 e8 b3 6d
> 06 ff 80 3d 73 10 f6 00 00 0f 84 7a 01 00 00 e8 a1 6d 06 ff <4c> 8b a3 28 03
> 00 00 e8 15 9e ff fe 85 c0 74 12 e8 8c 6d 06 ff
> RIP: __read_once_size include/linux/compiler.h:183 [inline] RSP:
> 88021fd03da8
> RIP: __in6_dev_get include/net/addrconf.h:300 [inline] RSP: 88021fd03da8
> RIP: ipv6_get_lladdr+0x6f/0x270 net/ipv6/addrconf.c:1813 RSP:
> 88021fd03da8
> CR2: 0328
> ---[ end trace cc2639f5bc6e4363 ]---

Invalidating this bug 

Re: BUG: unable to handle kernel NULL pointer dereference in qdisc_match_from_root

2018-01-30 Thread Eric Biggers
On Tue, Dec 19, 2017 at 05:43:00AM -0800, syzbot wrote:
> Hello,
> 
> syzkaller hit the following crash on
> 6084b576dca2e898f5c101baef151f7bfdbb606d
> git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/master
> compiler: gcc (GCC) 7.1.1 20170620
> .config is attached
> Raw console output is attached.
> 
> Unfortunately, I don't have any reproducer for this bug yet.
> 
> 
> netlink: 1 bytes leftover after parsing attributes in process
> `syz-executor6'.
> BUG: unable to handle kernel NULL pointer dereference at   (null)
> IP: qdisc_dev include/net/sch_generic.h:379 [inline]
> IP: qdisc_match_from_root+0x19/0xd0 net/sched/sch_api.c:266
> PGD 1db153067 P4D 1db153067 PUD 1da621067 PMD 0
> Oops:  [#1] SMP
> Dumping ftrace buffer:
>(ftrace buffer empty)
> Modules linked in:
> CPU: 0 PID: 15445 Comm: syz-executor5 Not tainted 4.15.0-rc3-next-20171214+
> #67
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> Google 01/01/2011
> RIP: 0010:qdisc_dev include/net/sch_generic.h:379 [inline]
> RIP: 0010:qdisc_match_from_root+0x19/0xd0 net/sched/sch_api.c:266
> RSP: 0018:c9dafac8 EFLAGS: 00010216
> RAX:  RBX: 8801de311c00 RCX: 82123f94
> RDX: 027f RSI: c90004086000 RDI: 8801db4de400
> RBP: c9dafae0 R08:  R09: c9dafb30
> R10: c9dafbb8 R11: 0004 R12: 8801db4de400
> R13: 0500 R14: 8801dfa1e1c0 R15: 0500
> FS:  7f98eef31700() GS:88021fc0() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2:  CR3: 0001da7ef000 CR4: 001406f0
> DR0: 2000 DR1: 2000 DR2: 
> DR3:  DR6: fffe0ff0 DR7: 0600
> Call Trace:
>  qdisc_lookup+0x2d/0x140 net/sched/sch_api.c:306
>  tc_get_qdisc+0xe2/0x380 net/sched/sch_api.c:1259
>  rtnetlink_rcv_msg+0x333/0x5d0 net/core/rtnetlink.c:4522
>  netlink_rcv_skb+0x92/0x160 net/netlink/af_netlink.c:2441
>  rtnetlink_rcv+0x1c/0x20 net/core/rtnetlink.c:4540
>  netlink_unicast_kernel net/netlink/af_netlink.c:1308 [inline]
>  netlink_unicast+0x1d4/0x290 net/netlink/af_netlink.c:1334
>  netlink_sendmsg+0x345/0x470 net/netlink/af_netlink.c:1897
>  sock_sendmsg_nosec net/socket.c:636 [inline]
>  sock_sendmsg+0x51/0x70 net/socket.c:646
>  sock_write_iter+0xa4/0x100 net/socket.c:915
>  call_write_iter include/linux/fs.h:1776 [inline]
>  new_sync_write fs/read_write.c:469 [inline]
>  __vfs_write+0x15b/0x1e0 fs/read_write.c:482
>  vfs_write+0xf0/0x230 fs/read_write.c:544
>  SYSC_write fs/read_write.c:589 [inline]
>  SyS_write+0x57/0xd0 fs/read_write.c:581
>  entry_SYSCALL_64_fastpath+0x1f/0x96
> RIP: 0033:0x452a39
> RSP: 002b:7f98eef30c58 EFLAGS: 0212 ORIG_RAX: 0001
> RAX: ffda RBX: 00758020 RCX: 00452a39
> RDX: 005e RSI: 20fab000 RDI: 0013
> RBP: 039b R08:  R09: 
> R10:  R11: 0212 R12: 006f3728
> R13:  R14: 7f98eef316d4 R15: 
> Code: 49 c7 c4 28 76 ff 83 eb dd 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 41
> 55 41 54 53 49 89 fc 41 89 f5 e8 ac 63 19 ff 49 8b 44 24 40 <48> 8b 18 48 85
> db 0f 84 8b 00 00 00 e8 96 63 19 ff 41 f6 44 24
> RIP: qdisc_dev include/net/sch_generic.h:379 [inline] RSP: c9dafac8
> RIP: qdisc_match_from_root+0x19/0xd0 net/sched/sch_api.c:266 RSP:
> c9dafac8
> CR2: 
> ---[ end trace c23e90bc7d735c44 ]---

Invalidating this bug since it hasn't been seen again, and it was reported while
KASAN was accidentally disabled in the syzbot kconfig due to a change to the
kconfig menus in linux-next (so this crash was possibly caused by slab
corruption elsewhere).

#syz invalid


Re: BUG: unable to handle kernel NULL pointer dereference in tc_fill_qdisc

2018-01-30 Thread Eric Biggers
On Tue, Dec 19, 2017 at 04:49:02AM -0800, syzbot wrote:
> Hello,
> 
> syzkaller hit the following crash on
> 6084b576dca2e898f5c101baef151f7bfdbb606d
> git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/master
> compiler: gcc (GCC) 7.1.1 20170620
> .config is attached
> Raw console output is attached.
> 
> Unfortunately, I don't have any reproducer for this bug yet.
> 
> 
> BUG: unable to handle kernel NULL pointer dereference at   (null)
> IP: qdisc_dev include/net/sch_generic.h:379 [inline]
> IP: tc_fill_qdisc+0xc8/0x4b0 net/sched/sch_api.c:792
> PGD 1dc5d4067 P4D 1dc5d4067 PUD 1db5fa067 PMD 0
> Oops:  [#1] SMP
> Dumping ftrace buffer:
>(ftrace buffer empty)
> Modules linked in:
> CPU: 0 PID: 7609 Comm: syz-executor4 Not tainted 4.15.0-rc3-next-20171214+
> #67
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> Google 01/01/2011
> RIP: 0010:qdisc_dev include/net/sch_generic.h:379 [inline]
> RIP: 0010:tc_fill_qdisc+0xc8/0x4b0 net/sched/sch_api.c:792
> RSP: 0018:c9ea38c8 EFLAGS: 00010212
> RAX:  RBX: 880216ac3c00 RCX: 8212563c
> RDX: 0a02 RSI: c90004f53000 RDI: 8801dc6a20b0
> RBP: c9ea3970 R08: 0014 R09: 8801dc6a20b0
> R10: c9ea37d8 R11: 0002 R12: 880214161b00
> R13: 8801dc6a208c R14:  R15: 0002
> FS:  7f6db5bc5700() GS:88021fc0() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2:  CR3: 0001dc75a000 CR4: 001426f0
> DR0: 2000 DR1:  DR2: 
> DR3:  DR6: fffe0ff0 DR7: 0600
> Call Trace:
>  tc_dump_qdisc_root+0x1f1/0x220 net/sched/sch_api.c:1474
>  tc_dump_qdisc+0x1a1/0x280 net/sched/sch_api.c:1540
>  netlink_dump+0x14e/0x360 net/netlink/af_netlink.c:
>  __netlink_dump_start+0x1bb/0x210 net/netlink/af_netlink.c:2319
>  netlink_dump_start include/linux/netlink.h:214 [inline]
>  rtnetlink_rcv_msg+0x44f/0x5d0 net/core/rtnetlink.c:4485
>  netlink_rcv_skb+0x92/0x160 net/netlink/af_netlink.c:2441
>  rtnetlink_rcv+0x1c/0x20 net/core/rtnetlink.c:4540
>  netlink_unicast_kernel net/netlink/af_netlink.c:1308 [inline]
>  netlink_unicast+0x1d4/0x290 net/netlink/af_netlink.c:1334
>  netlink_sendmsg+0x345/0x470 net/netlink/af_netlink.c:1897
>  sock_sendmsg_nosec net/socket.c:636 [inline]
>  sock_sendmsg+0x51/0x70 net/socket.c:646
>  ___sys_sendmsg+0x35e/0x3b0 net/socket.c:2026
>  __sys_sendmsg+0x50/0x90 net/socket.c:2060
>  SYSC_sendmsg net/socket.c:2071 [inline]
>  SyS_sendmsg+0x2d/0x50 net/socket.c:2067
>  entry_SYSCALL_64_fastpath+0x1f/0x96
> RIP: 0033:0x452a39
> RSP: 002b:7f6db5bc4c58 EFLAGS: 0212 ORIG_RAX: 002e
> RAX: ffda RBX: 00758020 RCX: 00452a39
> RDX:  RSI: 2061efc8 RDI: 001c
> RBP: 0343 R08:  R09: 
> R10:  R11: 0212 R12: 006f2ee8
> R13:  R14: 7f6db5bc56d4 R15: 
> Code: 41 b8 14 00 00 00 4c 89 e7 e8 05 17 01 00 48 85 c0 49 89 c5 0f 84 cb
> 02 00 00 e8 04 4d 19 ff 41 c7 45 10 00 00 00 00 48 8b 43 40 <48> 8b 00 8b 80
> 08 01 00 00 45 89 75 1c 41 89 45 14 8b 43 38 41
> RIP: qdisc_dev include/net/sch_generic.h:379 [inline] RSP: c9ea38c8
> RIP: tc_fill_qdisc+0xc8/0x4b0 net/sched/sch_api.c:792 RSP: c9ea38c8
> CR2: 
> ---[ end trace dffd1876816a33a6 ]---

Invalidating this bug since it hasn't been seen again, and it was reported while
KASAN was accidentally disabled in the syzbot kconfig due to a change to the
kconfig menus in linux-next (so this crash was possibly caused by slab
corruption elsewhere).

#syz invalid


RE: macvlan devices and vlan interaction

2018-01-30 Thread Keller, Jacob E
> -Original Message-
> From: Shannon Nelson [mailto:shannon.nel...@oracle.com]
> Sent: Tuesday, January 30, 2018 12:30 PM
> To: Keller, Jacob E ; netdev@vger.kernel.org
> Cc: Duyck, Alexander H 
> Subject: Re: macvlan devices and vlan interaction
> 
> Hi Jake,
> 
> The current behavior seems logical to me, but I suppose Alex might argue
> differently.  The macvlan was put onto the default lowerdev assuming the
> lowerdev will hand it all the default traffic, and then the macvlan
> splits out its own vlan traffic.  As soon as the lowerdev assumption
> changes, it is going to change what gets pushed up to the macvlan dev.
> If the lowerdev is separating the vlan traffic out of the "default" flow
> headed to the macvlan, then the initial assumption has changed and the
> vlan traffic has been vectored off before it can be delivered up the
> stack to the macvlan.
> 
> There's an argument that the lowerdev shouldn't know anything about the
> upperdev's routing, just deliver to the upperdev and let the upperdev
> worry about it.  But perhaps this becomes is a question of precedence:
> does the lowerdev split traffic first by mac address or by vlan tag.
> 

There's a few issues at play here. (1) the device driver has no idea which 
VLANs apply to which devs. So when adding a VLAN to upperdev, it just sends a 
notification to the lowerdev, saying please add VLAN N. The lowerdev doesn't 
have a clue which this applies to.

The second issue (2) is that partially, when deciding where traffic goes, the 
stack prioritises VLANs over macvlan upperdevs, so we end up routing traffic 
that should have gone to a macvlan into a VLAN attached to the lowerdev instead.

> I don't like your option 1: as you point out, it breaks current
> functionality, likely depended upon in some containers that are using
> macvlans to manage their traffic.  We don't know what's going on inside
> that container and I don't think we want to break its ability to split
> its own vlans.
> 

I don't really want to break the ability either, but look at this scenario:

upperdev macvlan created on some lowerdev, and put into a container.
upperdev creates VLAN 10 and starts receiving VLAN 10 traffic.

now, lowerdev creates VLAN 10 on the same lowerdev, possibly unaware of what 
the container did.
 
suddenly the upperdev macvlan no longer receives any VLAN 10 traffic.

Worse, the behavior is *different* depending on whether the macvlan is 
offloaded or not.

In an offloaded macvlan, at least from what i can tell, VLANs have not worked 
on any open source driver in the upstream kernel today, so the original case of 
upperdev creates VLAN 10 will just not receive traffic. This is a separate 
issue which I have a patch to resolve, but it still has problems with the 
leaked VLAN issue (where VLANs are added to the lowerdev directly).

You can argue that this is administrator error, but I'd rather fix it so that 
it's not possible one way or another. Unfortunately, I don't have any good way 
to figure out how to prevent this. The driver doesn't have any indication which 
VLANs apply to which devices.

> Like I said, I think the current behavior is mostly correct, but a
> version of option 2 might be good to help support offload of the
> mac+vlan pair into a macvlan channel.
> 
> sln
> 

I don't really like either option, so suggestions are welcome.

Thanks,
Jake


Re: BUG: unable to handle kernel NULL pointer dereference in ip_mc_up

2018-01-30 Thread Eric Biggers
On Tue, Dec 19, 2017 at 12:40:01AM -0800, syzbot wrote:
> Hello,
> 
> syzkaller hit the following crash on
> 6084b576dca2e898f5c101baef151f7bfdbb606d
> git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/master
> compiler: gcc (GCC) 7.1.1 20170620
> .config is attached
> Raw console output is attached.
> 
> Unfortunately, I don't have any reproducer for this bug yet.
> 
> 
> netlink: 29 bytes leftover after parsing attributes in process
> `syz-executor2'.
> device eql entered promiscuous mode
> BUG: unable to handle kernel NULL pointer dereference at 0578
> IP: read_pnet include/net/net_namespace.h:270 [inline]
> IP: dev_net include/linux/netdevice.h:2041 [inline]
> IP: ip_mc_up+0x15/0x170 net/ipv4/igmp.c:1736
> PGD 1df61f067 P4D 1df61f067 PUD 1de8fc067 PMD 0
> Oops:  [#1] SMP
> Dumping ftrace buffer:
>(ftrace buffer empty)
> Modules linked in:
> CPU: 0 PID: 23740 Comm: syz-executor6 Not tainted 4.15.0-rc3-next-20171214+
> #67
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> Google 01/01/2011
> netlink: 29 bytes leftover after parsing attributes in process
> `syz-executor2'.
> RIP: 0010:read_pnet include/net/net_namespace.h:270 [inline]
> RIP: 0010:dev_net include/linux/netdevice.h:2041 [inline]
> RIP: 0010:ip_mc_up+0x15/0x170 net/ipv4/igmp.c:1736
> RSP: 0018:c900010abb88 EFLAGS: 00010212
> RAX:  RBX: 0001 RCX: 821dd421
> RDX: 235e RSI: c900035dd000 RDI: 8801df56d000
> RBP: c900010abba0 R08:  R09: 
> R10: c900010abc20 R11:  R12: 8801df56d000
> R13: 8801df652000 R14:  R15: 8801df56d000
> FS:  7f4c60214700() GS:88021fc0() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 0578 CR3: 0001dc7f1000 CR4: 001426f0
> Call Trace:
>  inetdev_event+0x224/0x5c0 net/ipv4/devinet.c:1504
>  notifier_call_chain+0x41/0xc0 kernel/notifier.c:93
>  __raw_notifier_call_chain kernel/notifier.c:394 [inline]
>  raw_notifier_call_chain+0x2d/0x40 kernel/notifier.c:401
>  call_netdevice_notifiers_info+0x32/0x60 net/core/dev.c:1679
>  call_netdevice_notifiers net/core/dev.c:1697 [inline]
>  __dev_notify_flags+0x75/0x130 net/core/dev.c:6874
>  dev_change_flags+0x5e/0x70 net/core/dev.c:6910
>  devinet_ioctl+0x77b/0x930 net/ipv4/devinet.c:1083
>  inet_ioctl+0xda/0x130 net/ipv4/af_inet.c:902
>  sock_do_ioctl+0x2c/0x60 net/socket.c:964
>  sock_ioctl+0x211/0x320 net/socket.c:1061
>  vfs_ioctl fs/ioctl.c:46 [inline]
>  do_vfs_ioctl+0xaf/0x840 fs/ioctl.c:686
>  SYSC_ioctl fs/ioctl.c:701 [inline]
>  SyS_ioctl+0x8f/0xc0 fs/ioctl.c:692
>  entry_SYSCALL_64_fastpath+0x1f/0x96
> RIP: 0033:0x452a39
> RSP: 002b:7f4c60213c58 EFLAGS: 0212 ORIG_RAX: 0010
> RAX: ffda RBX: 00758020 RCX: 00452a39
> RDX: 2062ffe0 RSI: 8914 RDI: 0016
> RBP: 026a R08:  R09: 
> R10:  R11: 0212 R12: 006f1a90
> R13:  R14: 7f4c602146d4 R15: 
> Code: c7 90 cf e5 82 e8 44 04 06 ff e8 dc e2 3c 00 e9 64 ff ff ff 66 90 55
> 48 89 e5 41 55 41 54 53 49 89 fc e8 1f cf 0d ff 49 8b 04 24 <48> 8b 98 78 05
> 00 00 e8 ff c6 f0 ff 85 c0 0f 84 17 01 00 00 e8
> RIP: read_pnet include/net/net_namespace.h:270 [inline] RSP:
> c900010abb88
> RIP: dev_net include/linux/netdevice.h:2041 [inline] RSP: c900010abb88
> RIP: ip_mc_up+0x15/0x170 net/ipv4/igmp.c:1736 RSP: c900010abb88
> CR2: 0578
> ---[ end trace 41722ce46f86bba0 ]---

Invalidating this bug since it hasn't been seen again, and it was reported while
KASAN was accidentally disabled in the syzbot kconfig due to a change to the
kconfig menus in linux-next (so this crash was possibly caused by slab
corruption elsewhere).

#syz invalid


  1   2   >