Re: [PATCH 5/7] slub: support for bulk free with SLUB freelists

2015-09-29 Thread Jesper Dangaard Brouer

On Mon, 28 Sep 2015 11:30:00 -0500 (CDT) Christoph Lameter  
wrote:

> On Mon, 28 Sep 2015, Jesper Dangaard Brouer wrote:
> 
> > Not knowing SLUB as well as you, it took me several hours to realize
> > init_object() didn't overwrite the freepointer in the object.  Thus, I
> > think these comments make the reader aware of not-so-obvious
> > side-effects of SLAB_POISON and SLAB_RED_ZONE.
> 
> From the source:
> 
> /*
>  * Object layout:
>  *
>  * object address
>  *  Bytes of the object to be managed.
>  *  If the freepointer may overlay the object then the free
>  *  pointer is the first word of the object.
>  *
>  *  Poisoning uses 0x6b (POISON_FREE) and the last byte is
>  *  0xa5 (POISON_END)
>  *
>  * object + s->object_size
>  *  Padding to reach word boundary. This is also used for Redzoning.
>  *  Padding is extended by another word if Redzoning is enabled and
>  *  object_size == inuse.
>  *
>  *  We fill with 0xbb (RED_INACTIVE) for inactive objects and with
>  *  0xcc (RED_ACTIVE) for objects in use.
>  *
>  * object + s->inuse
>  *  Meta data starts here.
>  *
>  *  A. Free pointer (if we cannot overwrite object on free)
>  *  B. Tracking data for SLAB_STORE_USER
>  *  C. Padding to reach required alignment boundary or at mininum
>  *  one word if debugging is on to be able to detect writes
>  *  before the word boundary.

Okay, I will remove the comment.

The best doc on SLUB and SLAB layout comes from your slides titled:
 "Slab allocators in the Linux Kernel: SLAB, SLOB, SLUB"

Lets gracefully add a link to the slides here:
 http://events.linuxfoundation.org/sites/events/files/slides/slaballocators.pdf

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [net] skbuff: Fix skb checksum flag on skb pull

2015-09-29 Thread Andrey Vagin
On Tue, Sep 29, 2015 at 03:27:03AM +0300, Pravin Shelar wrote:
> On Mon, Sep 28, 2015 at 2:46 AM, Andrew Vagin  wrote:
> > Hi,
> >
> > With this patch, I can't connect two local tcp ipv6 sockets.
> >
> > [root@fc22-vm criu]# strace -e network python ipv6.py
> > socket(PF_INET6, SOCK_STREAM, IPPROTO_IP) = 3
> > bind(3, {sa_family=AF_INET6, sin6_port=htons(8976), inet_pton(AF_INET6, 
> > "::", _addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = 0
> > listen(3, 1)= 0
> > socket(PF_INET6, SOCK_STREAM, IPPROTO_IP) = 4
> > connect(4, {sa_family=AF_INET6, sin6_port=htons(8976), inet_pton(AF_INET6, 
> > "::1", _addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = -1 ETIMEDOUT 
> > (Connection timed out)
> >
> > [root@fc22-vm criu]# cat ipv6.py
> > import socket
> >
> > srv = socket.socket(socket.AF_INET6, socket.SOCK_STREAM)
> > srv.bind(("::0", 8976))
> > srv.listen(1)
> > c = socket.socket(socket.AF_INET6, socket.SOCK_STREAM)
> > c.connect(("::1", 8976))
> >
> 
> Can you try following patch.
> https://patchwork.ozlabs.org/patch/523632/

It works for me. Thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [net] orinoco_usb:Fix error handling in ezusb_probe()

2015-09-29 Thread Kalle Valo
RUC_Soft_Sec  writes:

> Current code assigns 0 to variable 'retval', which makes ezusb_probe() to
> return success even if alloc_orinocodev() fails.
>
> The related code snippets in mantis_dma_init() is as following.
>
> 1573 static int ezusb_probe(struct usb_interface *interface,
> 1574const struct usb_device_id *id)
> 1575 {
>
>  
>
> 1583 int retval = 0;
> 1584 int i;
> 1585
> 1586 priv = alloc_orinocodev(sizeof(*upriv), >dev,
> 1587 ezusb_hard_reset, NULL);
> 1588 if (!priv) {
> 1589 err("Couldn't allocate orinocodev");
> 1590 goto exit;
> 1591 }
>  ...
>
> 1729  exit:
> 1730 if (fw_entry) {
> 1731 firmware.code = NULL;
> 1732 firmware.size = 0;
> 1733 release_firmware(fw_entry);
> 1734 }
> 1735 usb_set_intfdata(interface, upriv);
> 1736 return retval;
> 1737 }
>
>  Fix it by checking the return value from alloc_orinocodev() and assigns
> '-ENOMEM' to variable 'retval' in the case of error.
>
> Signed-off-by: Zhang Yan ---
>  orinoco_usb.c |1 +
>  1 file changed, 1 insertion(+)
> diff --git a/orinoco_usb.c b/orinoco_usb.c

The patch looks corrupted. And the from header doesn't contain a proper
name.

-- 
Kalle Valo
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: unregister_netdevice warnings when deleting netns

2015-09-29 Thread Julian Anastasov

Hello,

On Mon, 28 Sep 2015, Eric W. Biederman wrote:

> Julian Anastasov  writes:
> 
> > On Mon, 28 Sep 2015, Anand Gurram wrote:
> >
> >> I am currently using kernel version 3.16.7 on a linux switch.
> >> While creating and destroying network namespaces I am observing below logs
> >> on the console
> >> "unregister_netdevice: waiting for lo to become free. Usage count = 1"
> >> 
> >> Can you please suggest and provide instructions on how to debug this issue.
> >> If any fix already available can you please point me to the link.
> >
> > There are two commits from Linux 4.2 that may help:
> >
> > commit e9e4dd3267d0 ("net: do not process device backlog during 
> > unregistration")
> > commit 2c17d27c36dc ("net: call rcu_read_lock early in process_backlog")
> >
> If that message repeats indefinitely it means there is a leaked
> reference to the network namespaces lo device.
> 
> If the message just spits out a few times and then goes away it simply
> means that something is taking a while to cleanup and drop it's
> reference.
> 
> This is slightly complicated by the fact that it is not uncommon when a
> network device goes away to redirect all references to itself to the lo
> device.

Yes, there is a little chance with forwarding disabled,
i.e. when presence of "ipv4: Avoid crashing in ip_error" does
not matter, flying packet to leave new reference somewhere,
without crashing. But it may be another problem, of course.

Regards

--
Julian Anastasov 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH] Fix false positives in can_checksum_protocol()

2015-09-29 Thread David Woodhouse
On Mon, 2015-09-28 at 12:37 -0700, Tom Herbert wrote:
> I think it's easier to just call skb_checksum_help from the driver
> when the packet is actually sent to the device (should be no cost for
> late binding).

That's true for checksum. Not for things like TSO though, and I wonder
if it's worth keeping it simple and doing it *all* in
.ndo_features_check()? 

> > Note that 'seeded with an IPv[46] pseudo header' isn't quite
> > sufficient. Some hardware like 8139cp is explicitly told to do a UDP or
> > a TCP checksum with a bit in the descriptor, so any UDP-like or TCP
> > -like checksum works out fine.
> > 
> UDP or TCP can be determined from csum_offset, e.g. 16=>TCP 6=>UDP

Kind of. There'll be false positives there too, though. That was
actually the basis of my first attempt to address this, at
http://lists.openwall.net/netdev/2013/01/14/36

-- 
dwmw2



smime.p7s
Description: S/MIME cryptographic signature


Re: mwifiex: Make mwifiex_dbg a function, reduce object size

2015-09-29 Thread Kalle Valo

> The mwifiex_dbg macro has two tests that could be consolidated
> into a function reducing overall object size ~10KB (~4%).
> 
> So convert the macro into a function.
> 
> $ size drivers/net/wireless/mwifiex/built-in.o* (x86-64 defconfig)
>text  data bss dec hex filename
>  233102  86284809  246539   3c30b 
> drivers/net/wireless/mwifiex/built-in.o.new
>  243949  86284809  257386   3ed6a 
> drivers/net/wireless/mwifiex/built-in.o.old
> 
> Signed-off-by: Joe Perches 

Thanks, applied to wireless-drivers-next.git.

Kalle Valo
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 0/6] make non-modular code explicitly non-modular

2015-09-29 Thread Finn Thain

Hi Paul,

On Mon, 28 Sep 2015, Paul Gortmaker wrote:

> On 28/09/2015 (Mon 23:09) Geert Uytterhoeven wrote:
> 
> > Hi Paul,
...
> > 
> > Why did you choose this approach?
> > What about changing the "bool"s to "tristate"s in Kconfig instead?
> 
> Long answer is here:
> 
> https://lkml.org/lkml/2015/8/24/888

You wrote, "If there was demand for them to be tristate, then it would 
have happened by now." I don't follow your reasoning. You might just as 
well remove entire drivers and then argue, "If there was demand for 
drivers without bugs, then someone would have written them by now".

Perhaps you meant, "If there was sufficient demand for them to be 
tristate, then sufficient resources would have been marshalled, as 
required to get an enhancement written, tested, submitted, reviewed and 
merged in the mainline kernel."

> 
> To summarize, it adds functionality to code I can't test, and with 300 
> or so of these, it already has been a large time sink.  Add to that 
> extending the functionality and testing the new functionality, and it 
> does not scale. Plus if something hasn't allowed tristate for over 10 
> years, where is the value in adding it now?

There is value to be gained by completing the tristate support, and there 
is value destroyed by removing the partial tristate support.

I'm not involved in building distro kernels, but I know that Debian's 
would benefit from these tristates, because it would reduce the size of 
the m68k multi-platform kernel binary.

And even if it is dead code you aim to remove, a lot of people have worked 
on it (according to git blame), including myself. We should not disregard 
that effort when we could leverage it instead.

For the macmace driver in particular, I did the platform driver 
conversion, and it should work as a module. I did not change it to 
tristate at the time because I did not want to deal with the question of 
the 'psc' global, which lacks an EXPORT_SYMBOL(psc). Anyway, I'll send a 
patch if Geert doesn't do so first.

> 
> > I gave it a try, and with some small changes the three m68k ethernet 
> > drivers build fine as modular drivers. I can send patches if you like 
> > it.
> 
> Per above, I don't see the value in it, but if you want to do it and 
> test it and own submitting the patches, then I can drop the 
> corresponding ones from my queue.

I can't test right now but I have the hardware and will attend to any 
issues if need be. I do not expect any issues, because the modular option 
seems to involve the same code paths in the driver.

If the CONFIG_MACMACE=m option was implemented badly and did not work 
correctly, at least it couldn't be called a regression, presuming that 'm' 
builds okay, and that the default was 'y' or 'n'.

> Either way we get the code matching the Kconfig which is what I'm after 
> out of this.

Yes, me too.

> 
> Note that if you do decide to do this, the one driver really needs more 
> than just tristate one line change, it had super ancient init code that 
> predates module_init and probably needs an update.

I think the solution for mac8390 is to do in the modular case exactly what 
Space.c does in the built-in case. That would mean that the modular driver 
would support only one card, just like the built-in driver. (That 
limitation is a problem which affects all Nubus card drivers, because they 
have to do all their own bus matching, because Nubus still lacks the 
necessary driver model support.)

I haven't looked at amd/hplance, but I expect that the issues are similar.

Geert, do plan to send patches for any of these drivers?

Regards,
Finn

> 
> Thanks,
> Paul.
> --
> 
> > 
> > Thanks!
> > 
> > Gr{oetje,eeting}s,
> > 
> > Geert

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: unregister_netdevice warnings when deleting netns

2015-09-29 Thread Eric W. Biederman
Anand Gurram  writes:

>>If the message just spits out a few times and then goes away it simply
>>means that something is taking a while to cleanup and drop it's
>>reference.
>
> The message just spits out few times and then goes away, I am trying
> to debug why cleanup is taking long,
> and where it is still referenced. Any pointers in debugging such
> issues will be of great help.

The one thing I have done in the past is to instrument dev_hold
and dev_put and look where in the code the stragglers are coming from
(when I can reproduce the issue reliably).

Sometimes people have addressed this class of issue with code review,
but with a slow cleanup you can't catch this by finding a missing
dev_put.

It takes some creativity to find these as people rarely make the same
mistake twice.

Eric
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: Mellanox crash with iommu=soft and swiotlb=force

2015-09-29 Thread Christoffer Dall
On Tue, Sep 29, 2015 at 12:59:35AM +0300, Or Gerlitz wrote:
> On Tue, Sep 29, 2015 at 12:04 AM, Christoffer Dall
>  wrote:
> > Hi,
> >
> > In doing some performance experiments I found that using a 10G Mellanox
> > MX354A Dual port FDR CX3 device on a server running Apache and running
> > ab against that server causes the system to crash with 'iommu=soft
> > swiotlb=force'.  The same behavior is seen without these options on Dom0
> > running under Xen.
> >
> > I have tried this on v4.0 and v4.3-rc3.
> 
> Woops, needs looking indeed. Unfortunately many people in the team are
> off for the Sukkot holiday with real backing to business coming to
> play on Oct 6th -- not sure we can really respond on that before.
> 
> Are you running over ARM? which? if not, is that x86 64bit?
> 
I'm running on x86_64.

Thanks,
-Christoffer
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [RFC v2 net-next 05/10] qede: Add basic network device support

2015-09-29 Thread Yuval Mintz
> > >> > +struct qede_rx_queue {
> > >> > +  __le16  *hw_cons_ptr;
> > >>
> > >> The __ variants of constants should be reserved for use in user
> > >> visible API's
> > >
> > > Really? If so, this needs to be fixed not only here but in lots of
> > > places in the series [e.g., entire HW HSI uses __le variants instead of 
> > > le].
> > > But why is it so? I.e., I understand that __le16 is defined in the
> > > uapi directory and thus accessible to users, but why the distinction?
> >
> > Because it shows whether the type is something exposed to userspace or not.
> >
> > If there are places where this is done incorrectly in the tree, it is
> > not a legitimate reason for you to do so as well.
> 
> Obviously.
> We'll fix all of those for next version.

I've taken a look and I couldn't find reference to 'le16' anywhere under
drivers/net/ethernet/. And 'le16' is actually a fs/ntfs/types.h definition.

What am I missing here?

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [v3,1/2] airo: fix IW_AUTH_ALG_OPEN_SYSTEM

2015-09-29 Thread Kalle Valo

> IW_AUTH_ALG_OPEN_SYSTEM is ambiguous in set_auth for WEP as
> wpa_supplicant uses it for both no encryption and WEP open system.
> Cache the last mode set (only of these two) and use it here.
> 
> This allows wpa_supplicant to work with unencrypted APs.
> 
> Signed-off-by: Ondrej Zary 

Thanks, 2 patches applied to wireless-drivers-next.git:

4a0f2ea79797 airo: fix IW_AUTH_ALG_OPEN_SYSTEM
2b8fa9e870b7 airo: Implement netif_carrier_on/off

Kalle Valo
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 2/7] netfilter: nft_meta: look at pkt->sk rather than skb->sk

2015-09-29 Thread kbuild test robot
Hi Daniel,

[auto build test results on v4.3-rc3 -- if it's inappropriate base, please 
ignore]

config: m68k-sun3_defconfig (attached as .config)
reproduce:
  wget 
https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
 -O ~/bin/make.cross
  chmod +x ~/bin/make.cross
  git checkout bcddf1d1557b51bef5ef395b5b7dd7b512794e2f
  # save the attached .config to linux build tree
  make.cross ARCH=m68k 

All warnings (new ones prefixed by >>):

   net/netfilter/nft_meta.c: In function 'nft_meta_get_eval':
>> net/netfilter/nft_meta.c:34:15: warning: unused variable 'sk' 
>> [-Wunused-variable]
 struct sock *sk = pkt->sk;
  ^

vim +/sk +34 net/netfilter/nft_meta.c

18  #include 
19  #include 
20  #include 
21  #include 
22  #include 
23  #include  /* for TCP_TIME_WAIT */
24  #include 
25  #include 
26  
27  void nft_meta_get_eval(const struct nft_expr *expr,
28 struct nft_regs *regs,
29 const struct nft_pktinfo *pkt)
30  {
31  const struct nft_meta *priv = nft_expr_priv(expr);
32  const struct net_device *in = pkt->in, *out = pkt->out;
33  struct sk_buff *skb = pkt->skb;
  > 34  struct sock *sk = pkt->sk;
35  u32 *dest = >data[priv->dreg];
36  
37  switch (priv->key) {
38  case NFT_META_LEN:
39  *dest = skb->len;
40  break;
41  case NFT_META_PROTOCOL:
42  *dest = 0;

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data


Re: unregister_netdevice warnings when deleting netns

2015-09-29 Thread Anand Gurram
Thanks Julian, I will check if these two commits work for me.

>I think, they will appear in other stable versions too...

Yes, I saw them in other versions, the fix which is suggested in those
branches didn't work for me.
Hope the above two commits help.

Regards,
Anand

On Tue, Sep 29, 2015 at 12:42 AM, Julian Anastasov  wrote:
>
> Hello,
>
> On Mon, 28 Sep 2015, Anand Gurram wrote:
>
>> I am currently using kernel version 3.16.7 on a linux switch.
>> While creating and destroying network namespaces I am observing below logs
>> on the console
>> "unregister_netdevice: waiting for lo to become free. Usage count = 1"
>>
>> Can you please suggest and provide instructions on how to debug this issue.
>> If any fix already available can you please point me to the link.
>
> There are two commits from Linux 4.2 that may help:
>
> commit e9e4dd3267d0 ("net: do not process device backlog during 
> unregistration")
> commit 2c17d27c36dc ("net: call rcu_read_lock early in process_backlog")
>
> For now I see them only in 3.2.71+ and 3.12.48+.
> I think, they will appear in other stable versions too...
>
> Regards
>
> --
> Julian Anastasov 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: unregister_netdevice warnings when deleting netns

2015-09-29 Thread Anand Gurram
>If the message just spits out a few times and then goes away it simply
>means that something is taking a while to cleanup and drop it's
>reference.

The message just spits out few times and then goes away, I am trying
to debug why cleanup is taking long,
and where it is still referenced. Any pointers in debugging such
issues will be of great help.

Best Regards,
Anand

On Tue, Sep 29, 2015 at 3:05 AM, Eric W. Biederman
 wrote:
> Julian Anastasov  writes:
>
>>   Hello,
>>
>> On Mon, 28 Sep 2015, Anand Gurram wrote:
>>
>>> I am currently using kernel version 3.16.7 on a linux switch.
>>> While creating and destroying network namespaces I am observing below logs
>>> on the console
>>> "unregister_netdevice: waiting for lo to become free. Usage count = 1"
>>>
>>> Can you please suggest and provide instructions on how to debug this issue.
>>> If any fix already available can you please point me to the link.
>>
>>   There are two commits from Linux 4.2 that may help:
>>
>> commit e9e4dd3267d0 ("net: do not process device backlog during 
>> unregistration")
>> commit 2c17d27c36dc ("net: call rcu_read_lock early in process_backlog")
>>
>>   For now I see them only in 3.2.71+ and 3.12.48+.
>> I think, they will appear in other stable versions too...
>
> If that message repeats indefinitely it means there is a leaked
> reference to the network namespaces lo device.
>
> If the message just spits out a few times and then goes away it simply
> means that something is taking a while to cleanup and drop it's
> reference.
>
> This is slightly complicated by the fact that it is not uncommon when a
> network device goes away to redirect all references to itself to the lo
> device.
>
> Eric
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH] Fix false positives in can_checksum_protocol()

2015-09-29 Thread David Woodhouse
On Mon, 2015-09-28 at 20:04 -0700, Tom Herbert wrote:
> 
> > I've been pondering a bit of a redesign in this space.  I think the
> > skb struct should be explicit in its instructions to hardware for
> > which offloads to do for each packet.
> >
> > In this way, the stack would be *directly* telling the drivers what to
> > do (and what not to do), solving all sorts of bugs and really improving
> > driver reliability and implementation.
> >
> Doesn't CHECKSUM_PARTIAL with csum_offset and csum_start already tell
> the driver unambiguously what to do wrt checksum offload?

Right. That's precisely what we *do* have. But as things stand, we
can't *use* it to its full capability.

It's fine for decent devices which can handle such explicit
instructions (advertised by the NETIF_F_HW_CSUM feature).

The problem is the crappy devices that can *only* checksum UDP and TCP
frames, advertised with the NETIF_F_IP{V6,}_CSUM features. We make a
primitive attempt *not* to feed arbitrary checksum requests to such
hardware. But we fail — we end up feeding *all* Legacy IP packets to a
NETIF_F_IP_CSUM device, and *all* IPv6 packets to a NETIF_F_IPV6_CSUM
device, regardless of whether they're *actually* TCP or UDP packets.

That's the problem I'm trying to solve. And then we *can* make full use
of the generic checksum offload (I had it working for ICMPv6 at one
point: http://lists.openwall.net/netdev/2013/01/14/38 ).

-- 
David WoodhouseOpen Source Technology Centre
david.woodho...@intel.com  Intel Corporation



smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH 5/7] slub: support for bulk free with SLUB freelists

2015-09-29 Thread Jesper Dangaard Brouer
On Mon, 28 Sep 2015 11:28:15 -0500 (CDT)
Christoph Lameter  wrote:

> On Mon, 28 Sep 2015, Jesper Dangaard Brouer wrote:
> 
> > > Do you really need separate parameters for freelist_head? If you just want
> > > to deal with one object pass it as freelist_head and set cnt = 1?
> >
> > Yes, I need it.  We need to know both the head and tail of the list to
> > splice it.
> 
> Ok so this is to avoid having to scan the list to its end?

True.

> x is the end
> of the list and freelist_head the beginning. That is weird.

Yes, it is a bit weird... the bulk free of freelists comes out as a
second-class citizen.

Okay, I'll try to change the slab_free() and __slab_free() calls to
have a "head" and "tail".  And let tail be NULL on single object free,
to allow compiler to do constant propagation (thus keeping existing
fastpath unaffected).  (The same code should be generated, but we will
have a more intuitive API).

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/5] net: m68k: Allow modular build

2015-09-29 Thread Geert Uytterhoeven
Hi David, Paul,

This patch series makes the remaining m68k Ethernet drivers modular.
It's an alternative to the last 3 patches of Paul Gortmaker's series
"[PATCH net-next 0/6] make non-modular code explicitly non-modular".

Note that "[PATCH 5/5] net: macmace: Allow modular build" depends on
"[PATCH 4/5] m68k/mac: Export Peripheral System Controller (PSC) base
address to modules". Feel free to take the dependency through the netdev
tree to avoid modular build breakage.

This was compile-tested only (mac_defconfig + allmodconfig) due to lack
of hardware.

Thanks!

Geert Uytterhoeven (5):
  net: mac8390: Allow modular build
  net: 7990: Export lance_poll() to modules
  net: hplance: Allow modular build
  m68k/mac: Export Peripheral System Controller (PSC) base address to
modules
  net: macmace: Allow modular build

 arch/m68k/mac/psc.c |  1 +
 drivers/net/ethernet/8390/Kconfig   |  2 +-
 drivers/net/ethernet/8390/mac8390.c | 32 ++--
 drivers/net/ethernet/amd/7990.c |  1 +
 drivers/net/ethernet/amd/Kconfig|  2 +-
 drivers/net/ethernet/apple/Kconfig  |  2 +-
 6 files changed, 15 insertions(+), 25 deletions(-)

-- 
1.9.1

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/5] net: hplance: Allow modular build

2015-09-29 Thread Geert Uytterhoeven
Signed-off-by: Geert Uytterhoeven 
---
 drivers/net/ethernet/amd/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/amd/Kconfig b/drivers/net/ethernet/amd/Kconfig
index afc62ea804fc35d4..0038709fd317d83c 100644
--- a/drivers/net/ethernet/amd/Kconfig
+++ b/drivers/net/ethernet/amd/Kconfig
@@ -100,7 +100,7 @@ config DECLANCE
  DEPCA series.  (This chipset is better known via the NE2100 cards.)
 
 config HPLANCE
-   bool "HP on-board LANCE support"
+   tristate "HP on-board LANCE support"
depends on DIO
select CRC32
---help---
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/5] net: macmace: Allow modular build

2015-09-29 Thread Geert Uytterhoeven
Signed-off-by: Geert Uytterhoeven 
---
This depends on "[PATCH 4/5] m68k/mac: Export Peripheral System
Controller (PSC) base address to modules".
---
 drivers/net/ethernet/apple/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/apple/Kconfig 
b/drivers/net/ethernet/apple/Kconfig
index d19a41b0c6d26691..31071297896c96b5 100644
--- a/drivers/net/ethernet/apple/Kconfig
+++ b/drivers/net/ethernet/apple/Kconfig
@@ -51,7 +51,7 @@ config BMAC
  will be called bmac.
 
 config MACMACE
-   bool "Macintosh (AV) onboard MACE ethernet"
+   tristate "Macintosh (AV) onboard MACE ethernet"
depends on MAC
select CRC32
---help---
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/5] net: 7990: Export lance_poll() to modules

2015-09-29 Thread Geert Uytterhoeven
If CONFIG_HPLANCE=m and CONFIG_NET_POLL_CONTROLLER=y:

ERROR: "lance_poll" [drivers/net/ethernet/amd/hplance.ko] undefined!

Add the missing export to fix this.

Signed-off-by: Geert Uytterhoeven 
---
 drivers/net/ethernet/amd/7990.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/amd/7990.c b/drivers/net/ethernet/amd/7990.c
index 98a10d555b793e02..66d0b73c39c03ba2 100644
--- a/drivers/net/ethernet/amd/7990.c
+++ b/drivers/net/ethernet/amd/7990.c
@@ -661,6 +661,7 @@ void lance_poll(struct net_device *dev)
spin_unlock(>devlock);
lance_interrupt(dev->irq, dev);
 }
+EXPORT_SYMBOL_GPL(lance_poll);
 #endif
 
 MODULE_LICENSE("GPL");
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/5] m68k/mac: Export Peripheral System Controller (PSC) base address to modules

2015-09-29 Thread Geert Uytterhoeven
If CONFIG_MACMACE=m:

ERROR: psc [drivers/net/ethernet/apple/macmace.ko] undefined!

Add the missing export to fix this.

Signed-off-by: Geert Uytterhoeven 
---
I'm OK with this going in through the netdev tree, as "net: macmace:
Allow modular build" depends on it.
---
 arch/m68k/mac/psc.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/m68k/mac/psc.c b/arch/m68k/mac/psc.c
index cd38f29955c87421..2290c0cae48beb8a 100644
--- a/arch/m68k/mac/psc.c
+++ b/arch/m68k/mac/psc.c
@@ -29,6 +29,7 @@
 
 int psc_present;
 volatile __u8 *psc;
+EXPORT_SYMBOL_GPL(psc);
 
 /*
  * Debugging dump, used in various places to see what's going on.
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/5] net: mac8390: Allow modular build

2015-09-29 Thread Geert Uytterhoeven
The modular driver supports only one card, just like the built-in
driver.

Note that this limitation is a problem which affects all Nubus card
drivers, because they have to do all their own bus matching, because
Nubus still lacks the necessary driver model support.

Suggested-by: Finn Thain 
Signed-off-by: Geert Uytterhoeven 
---
 drivers/net/ethernet/8390/Kconfig   |  2 +-
 drivers/net/ethernet/8390/mac8390.c | 32 ++--
 2 files changed, 11 insertions(+), 23 deletions(-)

diff --git a/drivers/net/ethernet/8390/Kconfig 
b/drivers/net/ethernet/8390/Kconfig
index edf72258ab1ddabe..29c3075bfb052f1d 100644
--- a/drivers/net/ethernet/8390/Kconfig
+++ b/drivers/net/ethernet/8390/Kconfig
@@ -64,7 +64,7 @@ config ARM_ETHERH
  should say Y to this option if you wish to use it with Linux.
 
 config MAC8390
-   bool "Macintosh NS 8390 based ethernet cards"
+   tristate "Macintosh NS 8390 based ethernet cards"
depends on MAC
select CRC32
---help---
diff --git a/drivers/net/ethernet/8390/mac8390.c 
b/drivers/net/ethernet/8390/mac8390.c
index 65cf60f6718c52fa..b9283901136e974a 100644
--- a/drivers/net/ethernet/8390/mac8390.c
+++ b/drivers/net/ethernet/8390/mac8390.c
@@ -454,34 +454,22 @@ MODULE_AUTHOR("David Huggins-Daines  and 
others");
 MODULE_DESCRIPTION("Macintosh NS8390-based Nubus Ethernet driver");
 MODULE_LICENSE("GPL");
 
-/* overkill, of course */
-static struct net_device *dev_mac8390[15];
-int init_module(void)
+static struct net_device *dev_mac8390;
+
+int __init init_module(void)
 {
-   int i;
-   for (i = 0; i < 15; i++) {
-   struct net_device *dev = mac8390_probe(-1);
-   if (IS_ERR(dev))
-   break;
-   dev_mac890[i] = dev;
-   }
-   if (!i) {
-   pr_notice("No useable cards found, driver NOT installed.\n");
-   return -ENODEV;
+   dev_mac8390 = mac8390_probe(-1);
+   if (IS_ERR(dev_mac8390)) {
+   pr_warn("mac8390: No card found\n");
+   return PTR_ERR(dev_mac8390);
}
return 0;
 }
 
-void cleanup_module(void)
+void __exit cleanup_module(void)
 {
-   int i;
-   for (i = 0; i < 15; i++) {
-   struct net_device *dev = dev_mac890[i];
-   if (dev) {
-   unregister_netdev(dev);
-   free_netdev(dev);
-   }
-   }
+   unregister_netdev(dev_mac8390);
+   free_netdev(dev_mac8390);
 }
 
 #endif /* MODULE */
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Per-flow IPv4 ECMP

2015-09-29 Thread Peter Nørlund
On Mon, 28 Sep 2015 06:48:09 -0700
roopa  wrote:

> On 9/28/15, 1:57 AM, Matthew Dupre wrote:
> > Hi,
> >
> > I'm interested in the Linux kernel's support for per-flow IPv4 ECMP
> > (i.e. consistent path selection based on a hash of the connection
> > tuple).  I'd been led to believe[1] that this depended on the route
> > cache, which was removed in 3.6.
> >
> > However, I tested a route with multiple next hops on a 3.10 and
> > 3.13 kernel, and ECMP was per-flow!  Obviously I'm pleased that
> > this was the case, but I'd like to understand why this is
> > supported, and whether I can rely on it in future.
> >
> > Could anyone give me a little clarification on whether this is now
> > supported by some means other than the route cache, and whether
> > that support is intended to be continued?
> >
> This is being worked on currently by Peter Nørlund
> https://lwn.net/Articles/657431/

Hi,

AFAIK if you create a socket on the machine having the multipath route,
each socket will seemingly be mapped to particular path, and it will
behave as per-flow. But if you use the machine as a router, each packet
is forwarded independently, potentially hitting different paths each
time.

Best Regards
  Peter Nørlund
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v4 net-next 0/2] ipv4: Hash-based multipath routing

2015-09-29 Thread David Laight
From: Peter Nørlund
> Sent: 29 September 2015 12:29
...
> As for using L4 hashing with anycast, CloudFlare apparently does L4
> hashing - they could have disabled it, but they didn't. Besides,
> analysis of my own load balancers showed that only one in every
> 500,000,000 packets is fragmented. And even if I hit a fragmented
> packet, it is only a problem if the packet hits the wrong load
> balancer, and if that load balancer haven't been updated with the state
> from another load balancer (that is, one of the very first packets). It
> is still a possible scenario though - especially with large HTTP
> cookies or file uploads. But apparently it is a common problem that IP
> fragments gets dropped on the Internet, so I suspect that ECMP+Anycast
> sites are just part of the pool of problematic sites for people with
> fragments.

Fragmentation is usually more of an issue with UDP than TCP.
Some SIP messages can get fragmented...

David



Re: [PATCH net-next 0/6] make non-modular code explicitly non-modular

2015-09-29 Thread Paul Gortmaker
[Re: [PATCH net-next 0/6] make non-modular code explicitly non-modular] On 
29/09/2015 (Tue 16:32) Finn Thain wrote:

> 
> Hi Paul,
> 
> On Mon, 28 Sep 2015, Paul Gortmaker wrote:
> 
> > On 28/09/2015 (Mon 23:09) Geert Uytterhoeven wrote:
> > 
> > > Hi Paul,
> ...
> > > 
> > > Why did you choose this approach?
> > > What about changing the "bool"s to "tristate"s in Kconfig instead?
> > 
> > Long answer is here:
> > 
> > https://lkml.org/lkml/2015/8/24/888
> 
> You wrote, "If there was demand for them to be tristate, then it would 
> have happened by now." I don't follow your reasoning. You might just as 
> well remove entire drivers and then argue, "If there was demand for 
> drivers without bugs, then someone would have written them by now".

I don't see those two sentences being alike, but in the end it does
not matter, since Geert has decided to do the conversion and test it.

And whatever code gets removed is never truly gone anyway; it lives on
in the git history forever.

Thanks,
Paul.
--

> 
> Perhaps you meant, "If there was sufficient demand for them to be 
> tristate, then sufficient resources would have been marshalled, as 
> required to get an enhancement written, tested, submitted, reviewed and 
> merged in the mainline kernel."
> 
> > 
> > To summarize, it adds functionality to code I can't test, and with 300 
> > or so of these, it already has been a large time sink.  Add to that 
> > extending the functionality and testing the new functionality, and it 
> > does not scale. Plus if something hasn't allowed tristate for over 10 
> > years, where is the value in adding it now?
> 
> There is value to be gained by completing the tristate support, and there 
> is value destroyed by removing the partial tristate support.
> 
> I'm not involved in building distro kernels, but I know that Debian's 
> would benefit from these tristates, because it would reduce the size of 
> the m68k multi-platform kernel binary.
> 
> And even if it is dead code you aim to remove, a lot of people have worked 
> on it (according to git blame), including myself. We should not disregard 
> that effort when we could leverage it instead.
> 
> For the macmace driver in particular, I did the platform driver 
> conversion, and it should work as a module. I did not change it to 
> tristate at the time because I did not want to deal with the question of 
> the 'psc' global, which lacks an EXPORT_SYMBOL(psc). Anyway, I'll send a 
> patch if Geert doesn't do so first.
> 
> > 
> > > I gave it a try, and with some small changes the three m68k ethernet 
> > > drivers build fine as modular drivers. I can send patches if you like 
> > > it.
> > 
> > Per above, I don't see the value in it, but if you want to do it and 
> > test it and own submitting the patches, then I can drop the 
> > corresponding ones from my queue.
> 
> I can't test right now but I have the hardware and will attend to any 
> issues if need be. I do not expect any issues, because the modular option 
> seems to involve the same code paths in the driver.
> 
> If the CONFIG_MACMACE=m option was implemented badly and did not work 
> correctly, at least it couldn't be called a regression, presuming that 'm' 
> builds okay, and that the default was 'y' or 'n'.
> 
> > Either way we get the code matching the Kconfig which is what I'm after 
> > out of this.
> 
> Yes, me too.
> 
> > 
> > Note that if you do decide to do this, the one driver really needs more 
> > than just tristate one line change, it had super ancient init code that 
> > predates module_init and probably needs an update.
> 
> I think the solution for mac8390 is to do in the modular case exactly what 
> Space.c does in the built-in case. That would mean that the modular driver 
> would support only one card, just like the built-in driver. (That 
> limitation is a problem which affects all Nubus card drivers, because they 
> have to do all their own bus matching, because Nubus still lacks the 
> necessary driver model support.)
> 
> I haven't looked at amd/hplance, but I expect that the issues are similar.
> 
> Geert, do plan to send patches for any of these drivers?
> 
> Regards,
> Finn
> 
> > 
> > Thanks,
> > Paul.
> > --
> > 
> > > 
> > > Thanks!
> > > 
> > > Gr{oetje,eeting}s,
> > > 
> > > Geert
> 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 net-next 0/2] ipv4: Hash-based multipath routing

2015-09-29 Thread Peter Nørlund
On Mon, 28 Sep 2015 19:55:41 -0700 (PDT)
David Miller  wrote:

> From: David Miller 
> Date: Mon, 28 Sep 2015 19:33:55 -0700 (PDT)
> 
> > From: Peter Nørlund 
> > Date: Wed, 23 Sep 2015 21:49:35 +0200
> > 
> >> When the routing cache was removed in 3.6, the IPv4 multipath
> >> algorithm changed from more or less being destination-based into
> >> being quasi-random per-packet scheduling. This increases the risk
> >> of out-of-order packets and makes it impossible to use multipath
> >> together with anycast services.
> >> 
> >> This patch series replaces the old implementation with flow-based
> >> load balancing based on a hash over the source and destination
> >> addresses.
> > 
> > This isn't perfect but it's a significant step in the right
> > direction. So I'm going to apply this to net-next now and we can
> > make incremental improvements upon it.
> 
> Actually, I had to revert, this doesn't build:
> 
> [davem@localhost net-next]$ make -s -j8
> Setup is 16876 bytes (padded to 16896 bytes).
> System is 10011 kB
> CRC 324f2811
> Kernel: arch/x86/boot/bzImage is ready  (#337)
> ERROR: "__ip_route_output_key_hash" [net/dccp/dccp_ipv4.ko] undefined!
> scripts/Makefile.modpost:90: recipe for target '__modpost' failed
> make[1]: *** [__modpost] Error 1
> Makefile:1095: recipe for target 'modules' failed
> make: *** [modules] Error 2

Sorry! I forgot to update the EXPORT_SYMBOL_GPL line.

In the meantime I've been doing some thinking (and measuring).
Considering that the broader goal is to make IPv6 and IPv4 behave as
identical as possible, it is probably not such a bad idea to just use
the flow dissector + modulo in the IPv4 code too - the patch will be
simpler than the current one.

I fear the performance impact of the flow dissector though - some of my
earlier measurements showed that it was 5-6 times slower than the
simple one I used. But maybe it is better to streamline the IPv4/IPv6
multipath first and then improve upon it afterward (make it work, make
it right, make it fast).

As for using L4 hashing with anycast, CloudFlare apparently does L4
hashing - they could have disabled it, but they didn't. Besides,
analysis of my own load balancers showed that only one in every
500,000,000 packets is fragmented. And even if I hit a fragmented
packet, it is only a problem if the packet hits the wrong load
balancer, and if that load balancer haven't been updated with the state
from another load balancer (that is, one of the very first packets). It
is still a possible scenario though - especially with large HTTP
cookies or file uploads. But apparently it is a common problem that IP
fragments gets dropped on the Internet, so I suspect that ECMP+Anycast
sites are just part of the pool of problematic sites for people with
fragments.

I'm still unsettled as to whether the ICMP handling belongs to the
kernel or not. The above breakage was in the ICMP-part of the
patchset, so judging from that, I guess it wasn't out of the question.
But in the "IPv4 and IPv6 should behave identical"-mindset, it probably
belongs to a separate, future patchset, adding ICMP handling to both
IPv4 and IPv6 - and it is actually more important for IPv6 than IPv4
since PMTUD cannot be disabled.

Best Regards,
  Peter Nørlund
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RFC 4/7] net: tcp_ipv4, udp_ipv4: hook up LOCAL_SOCKET_IN netfilter chains

2015-09-29 Thread Daniel Mack
Run the NF_INET_LOCAL_SOCKET_IN netfilter chain rules after the
destination socket for IPv4 unicast and multicast ports have been
looked up.

Signed-off-by: Daniel Mack 
---
 net/ipv4/netfilter/nf_tables_ipv4.c | 10 +-
 net/ipv4/tcp_ipv4.c |  8 
 net/ipv4/udp.c  | 15 +++
 3 files changed, 28 insertions(+), 5 deletions(-)

diff --git a/net/ipv4/netfilter/nf_tables_ipv4.c 
b/net/ipv4/netfilter/nf_tables_ipv4.c
index abee60a..2e65664 100644
--- a/net/ipv4/netfilter/nf_tables_ipv4.c
+++ b/net/ipv4/netfilter/nf_tables_ipv4.c
@@ -50,11 +50,11 @@ struct nft_af_info nft_af_ipv4 __read_mostly = {
.owner  = THIS_MODULE,
.nops   = 1,
.hooks  = {
-   [NF_INET_LOCAL_IN]  = nft_do_chain_ipv4,
-   [NF_INET_LOCAL_OUT] = nft_ipv4_output,
-   [NF_INET_FORWARD]   = nft_do_chain_ipv4,
-   [NF_INET_PRE_ROUTING]   = nft_do_chain_ipv4,
-   [NF_INET_POST_ROUTING]  = nft_do_chain_ipv4,
+   [NF_INET_LOCAL_IN]  = nft_do_chain_ipv4,
+   [NF_INET_LOCAL_OUT] = nft_ipv4_output,
+   [NF_INET_FORWARD]   = nft_do_chain_ipv4,
+   [NF_INET_PRE_ROUTING]   = nft_do_chain_ipv4,
+   [NF_INET_POST_ROUTING]  = nft_do_chain_ipv4,
[NF_INET_LOCAL_SOCKET_IN]   = nft_do_chain_ipv4,
},
 };
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 93898e0..83bc7b3 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -78,6 +78,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -1594,6 +1595,13 @@ int tcp_v4_rcv(struct sk_buff *skb)
if (!sk)
goto no_tcp_socket;
 
+   ret = nf_hook(NFPROTO_IPV4, NF_INET_LOCAL_SOCKET_IN, sk,
+ skb, skb->dev, NULL, NULL);
+   if (ret != 1) {
+   sock_put(sk);
+   return 0;
+   }
+
 process:
if (sk->sk_state == TCP_TIME_WAIT)
goto do_time_wait;
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index f7d1d5e..57c7571 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -97,6 +97,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -1633,7 +1634,14 @@ static void flush_stack(struct sock **stack, unsigned 
int count,
struct sock *sk;
 
for (i = 0; i < count; i++) {
+   int ret;
sk = stack[i];
+
+   ret = nf_hook(NFPROTO_IPV4, NF_INET_LOCAL_SOCKET_IN, sk,
+ skb, skb->dev, NULL, NULL);
+   if (ret != 1)
+   continue;
+
if (likely(!skb1))
skb1 = (i == final) ? skb : skb_clone(skb, GFP_ATOMIC);
 
@@ -1820,6 +1828,13 @@ int __udp4_lib_rcv(struct sk_buff *skb, struct udp_table 
*udptable,
if (sk) {
int ret;
 
+   ret = nf_hook(NFPROTO_IPV4, NF_INET_LOCAL_SOCKET_IN, sk,
+ skb, skb->dev, NULL, NULL);
+   if (ret != 1) {
+   sock_put(sk);
+   return 0;
+   }
+
if (inet_get_convert_csum(sk) && uh->check && !IS_UDPLITE(sk))
skb_checksum_try_convert(skb, IPPROTO_UDP, uh->check,
 inet_compute_pseudo);
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RFC 5/7] net: tcp_ipv6, udp_ipv6: hook up LOCAL_SOCKET_IN netfilter chains

2015-09-29 Thread Daniel Mack
Run the NF_INET_LOCAL_SOCKET_IN netfilter chain rules after the
destination socket for IPv6 unicast and multicast ports have been
looked up.

Signed-off-by: Daniel Mack 
---
 net/ipv6/netfilter/nf_tables_ipv6.c | 14 --
 net/ipv6/tcp_ipv6.c |  8 
 net/ipv6/udp.c  |  9 +
 3 files changed, 25 insertions(+), 6 deletions(-)

diff --git a/net/ipv6/netfilter/nf_tables_ipv6.c 
b/net/ipv6/netfilter/nf_tables_ipv6.c
index c8148ba..53c7923 100644
--- a/net/ipv6/netfilter/nf_tables_ipv6.c
+++ b/net/ipv6/netfilter/nf_tables_ipv6.c
@@ -49,11 +49,12 @@ struct nft_af_info nft_af_ipv6 __read_mostly = {
.owner  = THIS_MODULE,
.nops   = 1,
.hooks  = {
-   [NF_INET_LOCAL_IN]  = nft_do_chain_ipv6,
-   [NF_INET_LOCAL_OUT] = nft_ipv6_output,
-   [NF_INET_FORWARD]   = nft_do_chain_ipv6,
-   [NF_INET_PRE_ROUTING]   = nft_do_chain_ipv6,
-   [NF_INET_POST_ROUTING]  = nft_do_chain_ipv6,
+   [NF_INET_LOCAL_IN]  = nft_do_chain_ipv6,
+   [NF_INET_LOCAL_OUT] = nft_ipv6_output,
+   [NF_INET_FORWARD]   = nft_do_chain_ipv6,
+   [NF_INET_PRE_ROUTING]   = nft_do_chain_ipv6,
+   [NF_INET_POST_ROUTING]  = nft_do_chain_ipv6,
+   [NF_INET_LOCAL_SOCKET_IN]   = nft_do_chain_ipv6,
},
 };
 EXPORT_SYMBOL_GPL(nft_af_ipv6);
@@ -95,7 +96,8 @@ static const struct nf_chain_type filter_ipv6 = {
  (1 << NF_INET_LOCAL_OUT) |
  (1 << NF_INET_FORWARD) |
  (1 << NF_INET_PRE_ROUTING) |
- (1 << NF_INET_POST_ROUTING),
+ (1 << NF_INET_POST_ROUTING) |
+ (1 << NF_INET_LOCAL_SOCKET_IN),
 };
 
 static int __init nf_tables_ipv6_init(void)
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 97d9314..0b0706d 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -37,6 +37,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -1392,6 +1393,13 @@ static int tcp_v6_rcv(struct sk_buff *skb)
if (!sk)
goto no_tcp_socket;
 
+   ret = nf_hook(NFPROTO_IPV6, NF_INET_LOCAL_SOCKET_IN, sk,
+ skb, skb->dev, NULL, NULL);
+   if (ret != 1) {
+   sock_put(sk);
+   return 0;
+   }
+
 process:
if (sk->sk_state == TCP_TIME_WAIT)
goto do_time_wait;
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 0aba654..99df081 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -33,6 +33,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -746,7 +747,15 @@ static void flush_stack(struct sock **stack, unsigned int 
count,
unsigned int i;
 
for (i = 0; i < count; i++) {
+   int ret;
+
sk = stack[i];
+
+   ret = nf_hook(NFPROTO_IPV6, NF_INET_LOCAL_SOCKET_IN, sk,
+ skb, skb->dev, NULL, NULL);
+   if (ret != 1)
+   continue;
+
if (likely(!skb1))
skb1 = (i == final) ? skb : skb_clone(skb, GFP_ATOMIC);
if (!skb1) {
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RFC 3/7] netfilter: add NF_INET_LOCAL_SOCKET_IN chain type

2015-09-29 Thread Daniel Mack
Add a new chain type NF_INET_LOCAL_SOCKET_IN which is ran after the
input demux is complete and the final destination socket (if any)
has been determined.

This helps filtering packets based on information stored in the
destination socket, such as cgroup controller supplied net class IDs.

Note that rules in such chains are not processed in case the local
listen socket cannot be determined. Hence, if no application is
listening on a specific task, the resulting error code that is sent
back to the remote peer can't be controlled with rules in
NF_INET_LOCAL_SOCKET_IN chains.

Signed-off-by: Daniel Mack 
---
 include/uapi/linux/netfilter.h  | 1 +
 net/ipv4/netfilter/iptable_filter.c | 1 +
 net/ipv4/netfilter/nf_tables_ipv4.c | 4 +++-
 net/netfilter/nf_tables_inet.c  | 3 ++-
 4 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/netfilter.h b/include/uapi/linux/netfilter.h
index d93f949..96c3f8b 100644
--- a/include/uapi/linux/netfilter.h
+++ b/include/uapi/linux/netfilter.h
@@ -49,6 +49,7 @@ enum nf_inet_hooks {
NF_INET_FORWARD,
NF_INET_LOCAL_OUT,
NF_INET_POST_ROUTING,
+   NF_INET_LOCAL_SOCKET_IN,
NF_INET_NUMHOOKS
 };
 
diff --git a/net/ipv4/netfilter/iptable_filter.c 
b/net/ipv4/netfilter/iptable_filter.c
index a0f3bec..d65616a5 100644
--- a/net/ipv4/netfilter/iptable_filter.c
+++ b/net/ipv4/netfilter/iptable_filter.c
@@ -21,6 +21,7 @@ MODULE_AUTHOR("Netfilter Core Team ");
 MODULE_DESCRIPTION("iptables filter table");
 
 #define FILTER_VALID_HOOKS ((1 << NF_INET_LOCAL_IN) | \
+   (1 << NF_INET_LOCAL_SOCKET_IN) | \
(1 << NF_INET_FORWARD) | \
(1 << NF_INET_LOCAL_OUT))
 
diff --git a/net/ipv4/netfilter/nf_tables_ipv4.c 
b/net/ipv4/netfilter/nf_tables_ipv4.c
index aa180d3..abee60a 100644
--- a/net/ipv4/netfilter/nf_tables_ipv4.c
+++ b/net/ipv4/netfilter/nf_tables_ipv4.c
@@ -55,6 +55,7 @@ struct nft_af_info nft_af_ipv4 __read_mostly = {
[NF_INET_FORWARD]   = nft_do_chain_ipv4,
[NF_INET_PRE_ROUTING]   = nft_do_chain_ipv4,
[NF_INET_POST_ROUTING]  = nft_do_chain_ipv4,
+   [NF_INET_LOCAL_SOCKET_IN]   = nft_do_chain_ipv4,
},
 };
 EXPORT_SYMBOL_GPL(nft_af_ipv4);
@@ -96,7 +97,8 @@ static const struct nf_chain_type filter_ipv4 = {
  (1 << NF_INET_LOCAL_OUT) |
  (1 << NF_INET_FORWARD) |
  (1 << NF_INET_PRE_ROUTING) |
- (1 << NF_INET_POST_ROUTING),
+ (1 << NF_INET_POST_ROUTING) |
+ (1 << NF_INET_LOCAL_SOCKET_IN),
 };
 
 static int __init nf_tables_ipv4_init(void)
diff --git a/net/netfilter/nf_tables_inet.c b/net/netfilter/nf_tables_inet.c
index 9dd2d21..5544196 100644
--- a/net/netfilter/nf_tables_inet.c
+++ b/net/netfilter/nf_tables_inet.c
@@ -75,7 +75,8 @@ static const struct nf_chain_type filter_inet = {
  (1 << NF_INET_LOCAL_OUT) |
  (1 << NF_INET_FORWARD) |
  (1 << NF_INET_PRE_ROUTING) |
- (1 << NF_INET_POST_ROUTING),
+ (1 << NF_INET_POST_ROUTING) |
+ (1 << NF_INET_LOCAL_SOCKET_IN),
 };
 
 static int __init nf_tables_inet_init(void)
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RFC 7/7] net: dccp: hook up LOCAL_SOCKET_IN netfilter chains

2015-09-29 Thread Daniel Mack
Run the NF_INET_LOCAL_SOCKET_IN netfilter chain rules after the
destination socket for DCCP packets have been looked up.

Signed-off-by: Daniel Mack 
---
 net/dccp/ipv4.c | 14 +-
 net/dccp/ipv6.c | 14 +-
 2 files changed, 26 insertions(+), 2 deletions(-)

diff --git a/net/dccp/ipv4.c b/net/dccp/ipv4.c
index ccf4c56..9746138 100644
--- a/net/dccp/ipv4.c
+++ b/net/dccp/ipv4.c
@@ -14,6 +14,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -807,7 +808,7 @@ static int dccp_v4_rcv(struct sk_buff *skb)
const struct dccp_hdr *dh;
const struct iphdr *iph;
struct sock *sk;
-   int min_cov;
+   int ret, min_cov;
 
/* Step 1: Check header basics */
 
@@ -857,6 +858,17 @@ static int dccp_v4_rcv(struct sk_buff *skb)
 
/*
 * Step 2:
+*  ... or any LOCAL_SOCKET_IN rule disagrees ...
+*/
+   ret = nf_hook(NFPROTO_IPV4, NF_INET_LOCAL_SOCKET_IN, sk,
+ skb, skb->dev, NULL, NULL);
+   if (ret != 1) {
+   sock_put(sk);
+   return 0;
+   }
+
+   /*
+* Step 2:
 *  ... or S.state == TIMEWAIT,
 *  Generate Reset(No Connection) unless P.type == Reset
 *  Drop packet and return
diff --git a/net/dccp/ipv6.c b/net/dccp/ipv6.c
index 5165571..63b51e6 100644
--- a/net/dccp/ipv6.c
+++ b/net/dccp/ipv6.c
@@ -14,6 +14,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -691,7 +692,7 @@ static int dccp_v6_rcv(struct sk_buff *skb)
 {
const struct dccp_hdr *dh;
struct sock *sk;
-   int min_cov;
+   int ret, min_cov;
 
/* Step 1: Check header basics */
 
@@ -732,6 +733,17 @@ static int dccp_v6_rcv(struct sk_buff *skb)
 
/*
 * Step 2:
+*  ... or any LOCAL_SOCKET_IN rule disagrees ...
+*/
+   ret = nf_hook(NFPROTO_IPV6, NF_INET_LOCAL_SOCKET_IN, sk,
+ skb, skb->dev, NULL, NULL);
+   if (ret != 1) {
+   sock_put(sk);
+   return 0;
+   }
+
+   /*
+* Step 2:
 *  ... or S.state == TIMEWAIT,
 *  Generate Reset(No Connection) unless P.type == Reset
 *  Drop packet and return
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RFC 0/7] netfilter: introduce new chain type for local socket input

2015-09-29 Thread Daniel Mack
Here is a patch set that enables for full support for match rules
that take into account information about the local receiver socket.

Such rules allow administrators to implement per-application or
per-container firewalls which filter any type of network traffic
directed to or originated from a set of processes on a system,
independent of, for instance, local or remote port numbers.

In theory, such rules are already supported through the 'meta' and
'socket' rule types, but they currently do not work for ingress packets
delivered to unestablished listener sockets. NF_INET_LOCAL_IN chains
are iterated once the IP stack decides a packet is directed to the
local system, but before the local listener socket is determined.
Consequently, filter rules that are based on information derived from
the listener socket cannot be used reliably.

This patch set introduces a new chain type (NF_INET_LOCAL_SOCKET_IN)
that is iterated at a later point in time than NF_INET_LOCAL_IN, after
the listener socket demux has succeeded. Chains of this type are hence
only looked at _if_ there is a local listener.

The input paths for TCP and UDP for IPv4 and IPv6 are patched for
the new hook-up, as well as SCTP and DCCP.

Possible performance penalties for setups in which this new type is
not used need to be considered, but I lack a good test case for that.
I'm sure some people reading this do have proper test scenarios they
can run with these patches applied. I'd be very interested in these
numbers.

For SCTP and DCCP, I admittedly lack a proper test case as well, and
for UDP, I'm aware of a possible deadlock due to nf_hook() being called
under hslot->lock when the stack is flushed preliminarily from
__udp[46]_lib_mcast_deliver(). That's fixable, but I've kept it simple
for this RFC.

Only nftables is supported so far, but enabling iptables as well would
be straight forward.

I also have trivial patches for libnftnl and nftables to enable
the userspace part.

I'd appreciate some feedback about this approach.


Thanks,
Daniel


Daniel Mack (7):
  netfilter: add socket to struct nft_pktinfo
  netfilter: nft_meta: look at pkt->sk rather than skb->sk
  netfilter: add NF_INET_LOCAL_SOCKET_IN chain type
  net: tcp_ipv4, udp_ipv4: hook up LOCAL_SOCKET_IN netfilter chains
  net: tcp_ipv6, udp_ipv6: hook up LOCAL_SOCKET_IN netfilter chains
  net: sctp: hook up LOCAL_SOCKET_IN netfilter chains
  net: dccp: hook up LOCAL_SOCKET_IN netfilter chains

 include/net/netfilter/nf_tables.h   |  2 ++
 include/uapi/linux/netfilter.h  |  1 +
 net/dccp/ipv4.c | 14 +-
 net/dccp/ipv6.c | 14 +-
 net/ipv4/netfilter/iptable_filter.c |  1 +
 net/ipv4/netfilter/nf_tables_ipv4.c | 14 --
 net/ipv4/tcp_ipv4.c |  8 
 net/ipv4/udp.c  | 15 +++
 net/ipv6/netfilter/nf_tables_ipv6.c | 14 --
 net/ipv6/tcp_ipv6.c |  8 
 net/ipv6/udp.c  |  9 +
 net/netfilter/nf_tables_inet.c  |  3 ++-
 net/netfilter/nft_meta.c|  7 ---
 net/sctp/input.c| 11 ++-
 14 files changed, 102 insertions(+), 19 deletions(-)

-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RFC 2/7] netfilter: nft_meta: look at pkt->sk rather than skb->sk

2015-09-29 Thread Daniel Mack
pkt->sk is set to whatever was passed to nh_hook() by the caller,
and for post demux chains, this is the one that should be looked
at, as skb->sk is still NULL at this point in time.

Signed-off-by: Daniel Mack 
---
 net/netfilter/nft_meta.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/net/netfilter/nft_meta.c b/net/netfilter/nft_meta.c
index cb2f13e..f195bee 100644
--- a/net/netfilter/nft_meta.c
+++ b/net/netfilter/nft_meta.c
@@ -29,8 +29,9 @@ void nft_meta_get_eval(const struct nft_expr *expr,
   const struct nft_pktinfo *pkt)
 {
const struct nft_meta *priv = nft_expr_priv(expr);
-   const struct sk_buff *skb = pkt->skb;
const struct net_device *in = pkt->in, *out = pkt->out;
+   struct sk_buff *skb = pkt->skb;
+   struct sock *sk = pkt->sk;
u32 *dest = >data[priv->dreg];
 
switch (priv->key) {
@@ -168,9 +169,9 @@ void nft_meta_get_eval(const struct nft_expr *expr,
break;
 #ifdef CONFIG_CGROUP_NET_CLASSID
case NFT_META_CGROUP:
-   if (skb->sk == NULL || !sk_fullsock(skb->sk))
+   if (sk == NULL || !sk_fullsock(sk))
goto err;
-   *dest = skb->sk->sk_classid;
+   *dest = sk->sk_classid;
break;
 #endif
default:
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RFC 1/7] netfilter: add socket to struct nft_pktinfo

2015-09-29 Thread Daniel Mack
The high-level netfilter hook API already enables users to pass a socket,
but that information is lost when the chains are walked.

In order to let internal eval callbacks use the passed filter rather than
skb->sk, add a pointer of type 'struct sock' to 'struct nft_pktinfo' and
set that field via nft_set_pktinfo().

This allows us to run filter chains from situations where skb->sk is unset.
Fall back to skb->sk in case state->sk is NULL, so filter callbacks can be
written in a generic way.

Signed-off-by: Daniel Mack 
---
 include/net/netfilter/nf_tables.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/net/netfilter/nf_tables.h 
b/include/net/netfilter/nf_tables.h
index aa8bee7..05e97ed 100644
--- a/include/net/netfilter/nf_tables.h
+++ b/include/net/netfilter/nf_tables.h
@@ -13,6 +13,7 @@
 #define NFT_JUMP_STACK_SIZE16
 
 struct nft_pktinfo {
+   struct sock *sk;
struct sk_buff  *skb;
const struct net_device *in;
const struct net_device *out;
@@ -29,6 +30,7 @@ static inline void nft_set_pktinfo(struct nft_pktinfo *pkt,
   struct sk_buff *skb,
   const struct nf_hook_state *state)
 {
+   pkt->sk = state->sk ?: skb->sk;
pkt->skb = skb;
pkt->in = pkt->xt.in = state->in;
pkt->out = pkt->xt.out = state->out;
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RFC 6/7] net: sctp: hook up LOCAL_SOCKET_IN netfilter chains

2015-09-29 Thread Daniel Mack
Run the NF_INET_LOCAL_SOCKET_IN netfilter chain rules after the
destination socket for SCTP packets have been looked up.

Signed-off-by: Daniel Mack 
---
 net/sctp/input.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/net/sctp/input.c b/net/sctp/input.c
index b6493b3..0652406 100644
--- a/net/sctp/input.c
+++ b/net/sctp/input.c
@@ -45,6 +45,7 @@
 #include  /* For struct list_head */
 #include 
 #include 
+#include 
 #include  /* For struct timeval */
 #include 
 #include 
@@ -115,7 +116,7 @@ int sctp_rcv(struct sk_buff *skb)
struct sctphdr *sh;
union sctp_addr src;
union sctp_addr dest;
-   int family;
+   int ret, family;
struct sctp_af *af;
struct net *net = dev_net(skb->dev);
 
@@ -180,6 +181,14 @@ int sctp_rcv(struct sk_buff *skb)
rcvr = asoc ? >base : >base;
sk = rcvr->sk;
 
+   /* Iterate through rules in LOCAL_SOCKET_IN,
+* now that the receiver is known.
+*/
+   ret = nf_hook(family == AF_INET ? NFPROTO_IPV4 : NFPROTO_IPV6,
+ NF_INET_LOCAL_SOCKET_IN, sk, skb, skb->dev, NULL, NULL);
+   if (ret != 1)
+   goto discard_release;
+
/*
 * If a frame arrives on an interface and the receiving socket is
 * bound to another interface, via SO_BINDTODEVICE, treat it as OOTB
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 4.3-rc3 Regression: NFS access stall by commit 6ae459bdaaee

2015-09-29 Thread Takashi Iwai
On Tue, 29 Sep 2015 02:35:04 +0200,
Pravin Shelar wrote:
> 
> On Mon, Sep 28, 2015 at 6:12 AM, Takashi Iwai  wrote:
> > [I resent this since the previous mail didn't go out properly, as it
> >  seems; apologies if you already read it, please disregard]
> >
> > Hi,
> >
> > I noticed that NFS access from my workstation slowed down drastically,
> > almost stalls, with the fresh 4.3-rc3.  There are no particular kernel
> > errors / warnings.
> >
> > Then I performed git section, and it leaded to the commit:
> > 6ae459bdaaeebc632b16e54dcbabb490c6931d61
> > skbuff: Fix skb checksum flag on skb pull
> >
> > Reverting this commit from 4.3-rc3 fixed the issue indeed.
> >
> > Could you take a look at this?  I added Trond to Cc in case he might
> > already know of it.
> >
> I send out fix for similar issue. Can you try the posted patch.
> https://patchwork.ozlabs.org/patch/523632/

Yes, the patch fixes the problem, thanks.  Feel free to take my
tested-by tag:
  Tested-by: Takashi Iwai 

But I guess the real fix is only the first chunk and the latter is
nothing but a cleanup?  If so, it'd be better to split it.


Takashi
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/1] xfrm: Fix state threshold configuration from userspace

2015-09-29 Thread Michael Rossberg
Allow to change the replay threshold (XFRMA_REPLAY_THRESH) and expiry
timer (XFRMA_ETIMER_THRESH) of a state without having to set other
attributes like replay counter and byte lifetime. Changing these other
values while traffic flows will break the state.

Signed-off-by: Michael Rossberg 
---
 net/xfrm/xfrm_user.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index a8de9e3..24e06a2 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -1928,8 +1928,10 @@ static int xfrm_new_ae(struct sk_buff *skb, struct 
nlmsghdr *nlh,
struct nlattr *rp = attrs[XFRMA_REPLAY_VAL];
struct nlattr *re = attrs[XFRMA_REPLAY_ESN_VAL];
struct nlattr *lt = attrs[XFRMA_LTIME_VAL];
+   struct nlattr *et = attrs[XFRMA_ETIMER_THRESH];
+   struct nlattr *rt = attrs[XFRMA_REPLAY_THRESH];
 
-   if (!lt && !rp && !re)
+   if (!lt && !rp && !re && !et && !rt)
return err;
 
/* pedantic mode - thou shalt sayeth replaceth */
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 4/6] net: switchdev: pass callback to dump operation

2015-09-29 Thread Vivien Didelot
Similar to the notifier_call callback of a notifier_block, change the
function signature of switchdev dump operation to:

int switchdev_port_obj_dump(struct net_device *dev,
enum switchdev_obj_id id, void *obj,
int (*cb)(void *obj));

This allows the caller to pass and expect back a specific
switchdev_obj_* structure instead of the generic switchdev_obj one.

Drivers implementation of dump operation can now expect this specific
structure and call the callback with it. Drivers have been changed
accordingly.

Signed-off-by: Vivien Didelot 
---
 drivers/net/ethernet/rocker/rocker.c | 21 +
 include/net/switchdev.h  |  9 +---
 net/dsa/slave.c  | 26 +++--
 net/switchdev/switchdev.c| 45 ++--
 4 files changed, 53 insertions(+), 48 deletions(-)

diff --git a/drivers/net/ethernet/rocker/rocker.c 
b/drivers/net/ethernet/rocker/rocker.c
index 78fd443..107adb6 100644
--- a/drivers/net/ethernet/rocker/rocker.c
+++ b/drivers/net/ethernet/rocker/rocker.c
@@ -4538,10 +4538,10 @@ static int rocker_port_obj_del(struct net_device *dev,
 }
 
 static int rocker_port_fdb_dump(const struct rocker_port *rocker_port,
-   struct switchdev_obj *obj)
+   struct switchdev_obj_fdb *fdb,
+   int (*cb)(void *obj))
 {
struct rocker *rocker = rocker_port->rocker;
-   struct switchdev_obj_fdb *fdb = >u.fdb;
struct rocker_fdb_tbl_entry *found;
struct hlist_node *tmp;
unsigned long lock_flags;
@@ -4556,7 +4556,7 @@ static int rocker_port_fdb_dump(const struct rocker_port 
*rocker_port,
fdb->ndm_state = NUD_REACHABLE;
fdb->vid = rocker_port_vlan_to_vid(rocker_port,
   found->key.vlan_id);
-   err = obj->cb(obj);
+   err = cb(fdb);
if (err)
break;
}
@@ -4566,9 +4566,9 @@ static int rocker_port_fdb_dump(const struct rocker_port 
*rocker_port,
 }
 
 static int rocker_port_vlan_dump(const struct rocker_port *rocker_port,
-struct switchdev_obj *obj)
+struct switchdev_obj_vlan *vlan,
+   int (*cb)(void *obj))
 {
-   struct switchdev_obj_vlan *vlan = >u.vlan;
u16 vid;
int err = 0;
 
@@ -4579,7 +4579,7 @@ static int rocker_port_vlan_dump(const struct rocker_port 
*rocker_port,
if (rocker_vlan_id_is_internal(htons(vid)))
vlan->flags |= BRIDGE_VLAN_INFO_PVID;
vlan->vid_begin = vlan->vid_end = vid;
-   err = obj->cb(obj);
+   err = cb(vlan);
if (err)
break;
}
@@ -4588,17 +4588,18 @@ static int rocker_port_vlan_dump(const struct 
rocker_port *rocker_port,
 }
 
 static int rocker_port_obj_dump(struct net_device *dev,
-   struct switchdev_obj *obj)
+   enum switchdev_obj_id id, void *obj,
+   int (*cb)(void *obj))
 {
const struct rocker_port *rocker_port = netdev_priv(dev);
int err = 0;
 
-   switch (obj->id) {
+   switch (id) {
case SWITCHDEV_OBJ_PORT_FDB:
-   err = rocker_port_fdb_dump(rocker_port, obj);
+   err = rocker_port_fdb_dump(rocker_port, obj, cb);
break;
case SWITCHDEV_OBJ_PORT_VLAN:
-   err = rocker_port_vlan_dump(rocker_port, obj);
+   err = rocker_port_vlan_dump(rocker_port, obj, cb);
break;
default:
err = -EOPNOTSUPP;
diff --git a/include/net/switchdev.h b/include/net/switchdev.h
index 9ef7c56..0a80f2a 100644
--- a/include/net/switchdev.h
+++ b/include/net/switchdev.h
@@ -120,7 +120,8 @@ struct switchdev_ops {
int (*switchdev_port_obj_del)(struct net_device *dev,
  struct switchdev_obj *obj);
int (*switchdev_port_obj_dump)(struct net_device *dev,
- struct switchdev_obj *obj);
+  enum switchdev_obj_id id, void *obj,
+  int (*cb)(void *obj));
 };
 
 enum switchdev_notifier_type {
@@ -152,7 +153,8 @@ int switchdev_port_attr_set(struct net_device *dev,
struct switchdev_attr *attr);
 int switchdev_port_obj_add(struct net_device *dev, struct switchdev_obj *obj);
 int switchdev_port_obj_del(struct net_device *dev, struct switchdev_obj *obj);
-int switchdev_port_obj_dump(struct net_device *dev, struct switchdev_obj *obj);
+int switchdev_port_obj_dump(struct net_device *dev, enum switchdev_obj_id id,
+   

[PATCH net-next 0/6] net: switchdev: use specific switchdev_obj_*

2015-09-29 Thread Vivien Didelot
This patchset changes switchdev add, del, dump operations from this:

int (*switchdev_port_obj_add)(struct net_device *dev,
  struct switchdev_obj *obj,
  struct switchdev_trans *trans);
int (*switchdev_port_obj_del)(struct net_device *dev,
  struct switchdev_obj *obj);
int (*switchdev_port_obj_dump)(struct net_device *dev,
  struct switchdev_obj *obj);

to something similar to the notifier_call callback of a notifier_block:

int (*switchdev_port_obj_add)(struct net_device *dev,
  enum switchdev_obj_id id,
  const void *obj,
  struct switchdev_trans *trans);   
  
int (*switchdev_port_obj_del)(struct net_device *dev,
  enum switchdev_obj_id id,
  const void *obj);
int (*switchdev_port_obj_dump)(struct net_device *dev,
   enum switchdev_obj_id id, void *obj,
   int (*cb)(void *obj));

This allows the caller to pass and expect back a specific switchdev_obj_*
structure (e.g. switchdev_obj_fdb) instead of the generic switchdev_obj one.

This will simplify pushing the callback function down to the drivers.

The first 3 patches get rid of the dev parameter of the dump callback, since it
is not always neeeded (e.g. vlan_dump) and some drivers (such as DSA drivers)
may not have easy access to it.

Patches 4 and 5 implement the change in the switchdev operations and its users.

Patch 6 extracts the inner switchdev_obj_* structures from switchdev_obj and
removes this last one.

Vivien Didelot (6):
  net: switchdev: remove dev in port_vlan_dump_put
  net: switchdev: move dev in switchdev_fdb_dump
  net: switchdev: remove dev from switchdev_obj cb
  net: switchdev: pass callback to dump operation
  net: switchdev: abstract object in add/del ops
  net: switchdev: extract struct switchdev_obj_*

 drivers/net/ethernet/rocker/rocker.c |  42 
 include/net/switchdev.h  |  80 ---
 net/bridge/br_fdb.c  |  11 +--
 net/bridge/br_vlan.c |  24 ++---
 net/dsa/slave.c  |  46 +
 net/switchdev/switchdev.c| 184 ---
 6 files changed, 186 insertions(+), 201 deletions(-)

-- 
2.5.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 00/14] tcp: listener refactoring preparations

2015-09-29 Thread Eric Dumazet
This patch series makes changes to TCP/DCCP stacks so that
we can switch listener code to lockless mode.

This is done by marking const the listener socket in all
appropriate paths.

FastOpen code had to be changed to not dynamically allocate
a very small structure to make code simpler for following changes.

Eric Dumazet (14):
  tcp/dccp: constify send_synack and send_reset socket argument
  tcp: remove unused len argument from tcp_rcv_state_process()
  tcp: remove tcp_rcv_state_process() tcp_hdr argument
  dccp: use inet6_csk_route_req() helper
  inet: constify inet_csk_route_child_sock() socket argument
  inet: constify __inet_inherit_port() sock argument
  net: constify sk_gfp_atomic() sock argument
  dccp: constify dccp_create_openreq_child() sock argument
  tcp: constify tcp_create_openreq_child() socket argument
  tcp/dccp: constify syn_recv_sock() method sock argument
  tcp: cookie_init_sequence() cleanups
  tcp: constify tcp_v{4|6}_route_req() sock argument
  tcp: constify tcp_syn_flood_action() socket argument
  tcp: prepare fastopen code for upcoming listener changes

 include/linux/tcp.h | 22 --
 include/net/inet6_connection_sock.h |  2 +-
 include/net/inet_connection_sock.h  |  5 +++--
 include/net/inet_hashtables.h   |  2 +-
 include/net/request_sock.h  | 16 ++--
 include/net/sock.h  |  2 +-
 include/net/tcp.h   | 28 ++--
 net/core/request_sock.c |  9 -
 net/dccp/dccp.h |  6 +++---
 net/dccp/ipv4.c |  5 +++--
 net/dccp/ipv6.c | 24 +++-
 net/dccp/minisocks.c|  4 ++--
 net/ipv4/af_inet.c  | 10 +++---
 net/ipv4/inet_connection_sock.c | 19 +--
 net/ipv4/inet_hashtables.c  |  2 +-
 net/ipv4/syncookies.c   |  6 +-
 net/ipv4/tcp.c  | 14 ++
 net/ipv4/tcp_fastopen.c | 10 +-
 net/ipv4/tcp_input.c| 17 +
 net/ipv4/tcp_ipv4.c | 13 +++--
 net/ipv4/tcp_minisocks.c|  7 ---
 net/ipv6/inet6_connection_sock.c|  8 +---
 net/ipv6/syncookies.c   |  5 +
 net/ipv6/tcp_ipv6.c | 33 ++---
 24 files changed, 118 insertions(+), 151 deletions(-)

-- 
2.6.0.rc2.230.g3dd15c0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 03/14] tcp: remove tcp_rcv_state_process() tcp_hdr argument

2015-09-29 Thread Eric Dumazet
Factorize code to get tcp header from skb. It makes no sense
to duplicate code in callers.

Signed-off-by: Eric Dumazet 
---
 include/net/tcp.h| 3 +--
 net/ipv4/tcp_input.c | 4 ++--
 net/ipv4/tcp_ipv4.c  | 2 +-
 net/ipv4/tcp_minisocks.c | 2 +-
 net/ipv6/tcp_ipv6.c  | 2 +-
 5 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 1cfdedbe47e1..1fe0bd458cb4 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -365,8 +365,7 @@ void tcp_wfree(struct sk_buff *skb);
 void tcp_write_timer_handler(struct sock *sk);
 void tcp_delack_timer_handler(struct sock *sk);
 int tcp_ioctl(struct sock *sk, int cmd, unsigned long arg);
-int tcp_rcv_state_process(struct sock *sk, struct sk_buff *skb,
- const struct tcphdr *th);
+int tcp_rcv_state_process(struct sock *sk, struct sk_buff *skb);
 void tcp_rcv_established(struct sock *sk, struct sk_buff *skb,
 const struct tcphdr *th, unsigned int len);
 void tcp_rcv_space_adjust(struct sock *sk);
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index dcbddf12f4b3..67b27aee8d28 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -5698,11 +5698,11 @@ reset_and_undo:
  * address independent.
  */
 
-int tcp_rcv_state_process(struct sock *sk, struct sk_buff *skb,
- const struct tcphdr *th)
+int tcp_rcv_state_process(struct sock *sk, struct sk_buff *skb)
 {
struct tcp_sock *tp = tcp_sk(sk);
struct inet_connection_sock *icsk = inet_csk(sk);
+   const struct tcphdr *th = tcp_hdr(skb);
struct request_sock *req;
int queued = 0;
bool acceptable;
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 7e5ae1e01009..67c0dc8bddbf 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1420,7 +1420,7 @@ int tcp_v4_do_rcv(struct sock *sk, struct sk_buff *skb)
} else
sock_rps_save_rxhash(sk, skb);
 
-   if (tcp_rcv_state_process(sk, skb, tcp_hdr(skb))) {
+   if (tcp_rcv_state_process(sk, skb)) {
rsk = sk;
goto reset;
}
diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
index 9c7c61cf7462..139668cc2347 100644
--- a/net/ipv4/tcp_minisocks.c
+++ b/net/ipv4/tcp_minisocks.c
@@ -821,7 +821,7 @@ int tcp_child_process(struct sock *parent, struct sock 
*child,
int state = child->sk_state;
 
if (!sock_owned_by_user(child)) {
-   ret = tcp_rcv_state_process(child, skb, tcp_hdr(skb));
+   ret = tcp_rcv_state_process(child, skb);
/* Wakeup parent, send SIGIO */
if (state == TCP_SYN_RECV && child->sk_state != state)
parent->sk_data_ready(parent);
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index b6e473f0f62e..334d548a0cf6 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -1272,7 +1272,7 @@ static int tcp_v6_do_rcv(struct sock *sk, struct sk_buff 
*skb)
} else
sock_rps_save_rxhash(sk, skb);
 
-   if (tcp_rcv_state_process(sk, skb, tcp_hdr(skb)))
+   if (tcp_rcv_state_process(sk, skb))
goto reset;
if (opt_skb)
goto ipv6_pktoptions;
-- 
2.6.0.rc2.230.g3dd15c0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 04/14] dccp: use inet6_csk_route_req() helper

2015-09-29 Thread Eric Dumazet
Before changing dccp_v6_request_recv_sock() sock argument
to const, we need to get rid of security_sk_classify_flow(),
and it seems doable by reusing inet6_csk_route_req() helper.

We need to add a proto parameter to inet6_csk_route_req(),
not assume it is TCP.

Signed-off-by: Eric Dumazet 
---
 include/net/inet6_connection_sock.h |  2 +-
 net/dccp/ipv6.c | 17 +++--
 net/ipv6/inet6_connection_sock.c|  8 +---
 net/ipv6/tcp_ipv6.c |  7 ---
 4 files changed, 13 insertions(+), 21 deletions(-)

diff --git a/include/net/inet6_connection_sock.h 
b/include/net/inet6_connection_sock.h
index 81d937e820c4..79b2a4c09ca6 100644
--- a/include/net/inet6_connection_sock.h
+++ b/include/net/inet6_connection_sock.h
@@ -26,7 +26,7 @@ int inet6_csk_bind_conflict(const struct sock *sk,
const struct inet_bind_bucket *tb, bool relax);
 
 struct dst_entry *inet6_csk_route_req(const struct sock *sk, struct flowi6 
*fl6,
- const struct request_sock *req);
+ const struct request_sock *req, u8 proto);
 
 struct request_sock *inet6_csk_search_req(struct sock *sk,
  const __be16 rport,
diff --git a/net/dccp/ipv6.c b/net/dccp/ipv6.c
index aa719e700961..0966bc08d362 100644
--- a/net/dccp/ipv6.c
+++ b/net/dccp/ipv6.c
@@ -462,22 +462,11 @@ static struct sock *dccp_v6_request_recv_sock(struct sock 
*sk,
if (sk_acceptq_is_full(sk))
goto out_overflow;
 
-   if (dst == NULL) {
-   struct in6_addr *final_p, final;
+   if (!dst) {
struct flowi6 fl6;
 
-   memset(, 0, sizeof(fl6));
-   fl6.flowi6_proto = IPPROTO_DCCP;
-   fl6.daddr = ireq->ir_v6_rmt_addr;
-   final_p = fl6_update_dst(, np->opt, );
-   fl6.saddr = ireq->ir_v6_loc_addr;
-   fl6.flowi6_oif = sk->sk_bound_dev_if;
-   fl6.fl6_dport = ireq->ir_rmt_port;
-   fl6.fl6_sport = htons(ireq->ir_num);
-   security_sk_classify_flow(sk, flowi6_to_flowi());
-
-   dst = ip6_dst_lookup_flow(sk, , final_p);
-   if (IS_ERR(dst))
+   dst = inet6_csk_route_req(sk, , req, IPPROTO_DCCP);
+   if (!dst)
goto out;
}
 
diff --git a/net/ipv6/inet6_connection_sock.c b/net/ipv6/inet6_connection_sock.c
index 91b7d33f508b..163bfef3e5db 100644
--- a/net/ipv6/inet6_connection_sock.c
+++ b/net/ipv6/inet6_connection_sock.c
@@ -67,15 +67,16 @@ EXPORT_SYMBOL_GPL(inet6_csk_bind_conflict);
 
 struct dst_entry *inet6_csk_route_req(const struct sock *sk,
  struct flowi6 *fl6,
- const struct request_sock *req)
+ const struct request_sock *req,
+ u8 proto)
 {
struct inet_request_sock *ireq = inet_rsk(req);
-   struct ipv6_pinfo *np = inet6_sk(sk);
+   const struct ipv6_pinfo *np = inet6_sk(sk);
struct in6_addr *final_p, final;
struct dst_entry *dst;
 
memset(fl6, 0, sizeof(*fl6));
-   fl6->flowi6_proto = IPPROTO_TCP;
+   fl6->flowi6_proto = proto;
fl6->daddr = ireq->ir_v6_rmt_addr;
final_p = fl6_update_dst(fl6, np->opt, );
fl6->saddr = ireq->ir_v6_loc_addr;
@@ -91,6 +92,7 @@ struct dst_entry *inet6_csk_route_req(const struct sock *sk,
 
return dst;
 }
+EXPORT_SYMBOL(inet6_csk_route_req);
 
 /*
  * request_sock (formerly open request) hash tables.
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 334d548a0cf6..092a23ef1feb 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -447,7 +447,8 @@ static int tcp_v6_send_synack(const struct sock *sk, struct 
dst_entry *dst,
int err = -ENOMEM;
 
/* First, grab a route. */
-   if (!dst && (dst = inet6_csk_route_req(sk, fl6, req)) == NULL)
+   if (!dst && (dst = inet6_csk_route_req(sk, fl6, req,
+  IPPROTO_TCP)) == NULL)
goto done;
 
skb = tcp_make_synack(sk, dst, req, foc);
@@ -694,7 +695,7 @@ static struct dst_entry *tcp_v6_route_req(struct sock *sk, 
struct flowi *fl,
 {
if (strict)
*strict = true;
-   return inet6_csk_route_req(sk, >u.ip6, req);
+   return inet6_csk_route_req(sk, >u.ip6, req, IPPROTO_TCP);
 }
 
 struct request_sock_ops tcp6_request_sock_ops __read_mostly = {
@@ -1058,7 +1059,7 @@ static struct sock *tcp_v6_syn_recv_sock(struct sock *sk, 
struct sk_buff *skb,
goto out_overflow;
 
if (!dst) {
-   dst = inet6_csk_route_req(sk, , req);
+   dst = inet6_csk_route_req(sk, , req, IPPROTO_TCP);
if (!dst)
goto out;
}
-- 
2.6.0.rc2.230.g3dd15c0

--
To 

[PATCH net-next 14/14] tcp: prepare fastopen code for upcoming listener changes

2015-09-29 Thread Eric Dumazet
While auditing TCP stack for upcoming 'lockless' listener changes,
I found I had to change fastopen_init_queue() to properly init the object
before publishing it.

Otherwise an other cpu could try to lock the spinlock before it gets
properly initialized.

Instead of adding appropriate barriers, just remove dynamic memory
allocations :
- Structure is 28 bytes on 64bit arches. Using additional 8 bytes
  for holding a pointer seems overkill.
- Two listeners can share same cache line and performance would suffer.

If we really want to save few bytes, we would instead dynamically allocate
whole struct request_sock_queue in the future.

Signed-off-by: Eric Dumazet 
---
 include/linux/tcp.h | 22 --
 include/net/request_sock.h  |  7 ++-
 net/core/request_sock.c |  9 -
 net/ipv4/af_inet.c  | 10 +++---
 net/ipv4/inet_connection_sock.c | 17 -
 net/ipv4/tcp.c  | 14 ++
 net/ipv4/tcp_fastopen.c | 10 +-
 net/ipv4/tcp_ipv4.c |  2 +-
 net/ipv6/tcp_ipv6.c |  4 ++--
 9 files changed, 35 insertions(+), 60 deletions(-)

diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index fcb573be75d9..e442e6e9a365 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -382,25 +382,11 @@ static inline bool tcp_passive_fastopen(const struct sock 
*sk)
tcp_sk(sk)->fastopen_rsk != NULL);
 }
 
-extern void tcp_sock_destruct(struct sock *sk);
-
-static inline int fastopen_init_queue(struct sock *sk, int backlog)
+static inline void fastopen_queue_tune(struct sock *sk, int backlog)
 {
-   struct request_sock_queue *queue =
-   _csk(sk)->icsk_accept_queue;
-
-   if (queue->fastopenq == NULL) {
-   queue->fastopenq = kzalloc(
-   sizeof(struct fastopen_queue),
-   sk->sk_allocation);
-   if (queue->fastopenq == NULL)
-   return -ENOMEM;
-
-   sk->sk_destruct = tcp_sock_destruct;
-   spin_lock_init(>fastopenq->lock);
-   }
-   queue->fastopenq->max_qlen = backlog;
-   return 0;
+   struct request_sock_queue *queue = _csk(sk)->icsk_accept_queue;
+
+   queue->fastopenq.max_qlen = backlog;
 }
 
 static inline void tcp_saved_syn_free(struct tcp_sock *tp)
diff --git a/include/net/request_sock.h b/include/net/request_sock.h
index c146b5284786..d2544de329bd 100644
--- a/include/net/request_sock.h
+++ b/include/net/request_sock.h
@@ -180,11 +180,8 @@ struct request_sock_queue {
struct request_sock *rskq_accept_tail;
u8  rskq_defer_accept;
struct listen_sock  *listen_opt;
-   struct fastopen_queue   *fastopenq; /* This is non-NULL iff TFO has been
-* enabled on this listener. Check
-* max_qlen != 0 in fastopen_queue
-* to determine if TFO is enabled
-* right at this moment.
+   struct fastopen_queue   fastopenq;  /* Check max_qlen != 0 to determine
+* if TFO is enabled.
 */
 
/* temporary alignment, our goal is to get rid of this lock */
diff --git a/net/core/request_sock.c b/net/core/request_sock.c
index b42f0e26f89e..e22cfa4ed25f 100644
--- a/net/core/request_sock.c
+++ b/net/core/request_sock.c
@@ -59,6 +59,13 @@ int reqsk_queue_alloc(struct request_sock_queue *queue,
 
get_random_bytes(>hash_rnd, sizeof(lopt->hash_rnd));
spin_lock_init(>syn_wait_lock);
+
+   spin_lock_init(>fastopenq.lock);
+   queue->fastopenq.rskq_rst_head = NULL;
+   queue->fastopenq.rskq_rst_tail = NULL;
+   queue->fastopenq.qlen = 0;
+   queue->fastopenq.max_qlen = 0;
+
queue->rskq_accept_head = NULL;
lopt->nr_table_entries = nr_table_entries;
lopt->max_qlen_log = ilog2(nr_table_entries);
@@ -174,7 +181,7 @@ void reqsk_fastopen_remove(struct sock *sk, struct 
request_sock *req,
struct sock *lsk = req->rsk_listener;
struct fastopen_queue *fastopenq;
 
-   fastopenq = inet_csk(lsk)->icsk_accept_queue.fastopenq;
+   fastopenq = _csk(lsk)->icsk_accept_queue.fastopenq;
 
tcp_sk(sk)->fastopen_rsk = NULL;
spin_lock_bh(>lock);
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 8a556643b874..3af85eecbe11 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -219,17 +219,13 @@ int inet_listen(struct socket *sock, int backlog)
 * shutdown() (rather than close()).
 */
if ((sysctl_tcp_fastopen & TFO_SERVER_ENABLE) != 0 &&
-   !inet_csk(sk)->icsk_accept_queue.fastopenq) {
+   !inet_csk(sk)->icsk_accept_queue.fastopenq.max_qlen) {

[PATCH net-next 13/14] tcp: constify tcp_syn_flood_action() socket argument

2015-09-29 Thread Eric Dumazet
tcp_syn_flood_action() will soon be called with unlocked socket.
In order to avoid SYN flood warning being emitted multiple times,
use xchg().
Extend max_qlen_log and synflood_warned fields in struct listen_sock
to u32

Signed-off-by: Eric Dumazet 
---
 include/net/request_sock.h | 5 ++---
 net/ipv4/tcp_input.c   | 9 +
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/include/net/request_sock.h b/include/net/request_sock.h
index 90247ec7955b..c146b5284786 100644
--- a/include/net/request_sock.h
+++ b/include/net/request_sock.h
@@ -129,9 +129,8 @@ struct listen_sock {
atomic_tqlen_dec; /* qlen = qlen_inc - qlen_dec */
atomic_tyoung_dec;
 
-   u8  max_qlen_log cacheline_aligned_in_smp;
-   u8  synflood_warned;
-   /* 2 bytes hole, try to use */
+   u32 max_qlen_log cacheline_aligned_in_smp;
+   u32 synflood_warned;
u32 hash_rnd;
u32 nr_table_entries;
struct request_sock *syn_table[0];
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 67b27aee8d28..e58cbcd2f07e 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -6064,7 +6064,7 @@ EXPORT_SYMBOL(inet_reqsk_alloc);
 /*
  * Return true if a syncookie should be sent
  */
-static bool tcp_syn_flood_action(struct sock *sk,
+static bool tcp_syn_flood_action(const struct sock *sk,
 const struct sk_buff *skb,
 const char *proto)
 {
@@ -6082,11 +6082,12 @@ static bool tcp_syn_flood_action(struct sock *sk,
NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_TCPREQQFULLDROP);
 
lopt = inet_csk(sk)->icsk_accept_queue.listen_opt;
-   if (!lopt->synflood_warned && sysctl_tcp_syncookies != 2) {
-   lopt->synflood_warned = 1;
+   if (!lopt->synflood_warned &&
+   sysctl_tcp_syncookies != 2 &&
+   xchg(>synflood_warned, 1) == 0)
pr_info("%s: Possible SYN flooding on port %d. %s.  Check SNMP 
counters.\n",
proto, ntohs(tcp_hdr(skb)->dest), msg);
-   }
+
return want_cookie;
 }
 
-- 
2.6.0.rc2.230.g3dd15c0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 11/14] tcp: cookie_init_sequence() cleanups

2015-09-29 Thread Eric Dumazet
Some common IPv4/IPv6 code can be factorized.
Also constify cookie_init_sequence() socket argument.

Signed-off-by: Eric Dumazet 
---
 include/net/tcp.h | 19 ++-
 net/ipv4/syncookies.c |  6 +-
 net/ipv6/syncookies.c |  5 +
 3 files changed, 12 insertions(+), 18 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index a1d2f5d6a430..5aa6672c6f5b 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -491,8 +491,9 @@ struct sock *cookie_v4_check(struct sock *sk, struct 
sk_buff *skb);
 
 /* syncookies: remember time of last synqueue overflow
  * But do not dirty this field too often (once per second is enough)
+ * It is racy as we do not hold a lock, but race is very minor.
  */
-static inline void tcp_synq_overflow(struct sock *sk)
+static inline void tcp_synq_overflow(const struct sock *sk)
 {
unsigned long last_overflow = tcp_sk(sk)->rx_opt.ts_recent_stamp;
unsigned long now = jiffies;
@@ -519,8 +520,7 @@ static inline u32 tcp_cookie_time(void)
 
 u32 __cookie_v4_init_sequence(const struct iphdr *iph, const struct tcphdr *th,
  u16 *mssp);
-__u32 cookie_v4_init_sequence(struct sock *sk, const struct sk_buff *skb,
- __u16 *mss);
+__u32 cookie_v4_init_sequence(const struct sk_buff *skb, __u16 *mss);
 __u32 cookie_init_timestamp(struct request_sock *req);
 bool cookie_timestamp_decode(struct tcp_options_received *opt);
 bool cookie_ecn_ok(const struct tcp_options_received *opt,
@@ -533,8 +533,7 @@ struct sock *cookie_v6_check(struct sock *sk, struct 
sk_buff *skb);
 
 u32 __cookie_v6_init_sequence(const struct ipv6hdr *iph,
  const struct tcphdr *th, u16 *mssp);
-__u32 cookie_v6_init_sequence(struct sock *sk, const struct sk_buff *skb,
- __u16 *mss);
+__u32 cookie_v6_init_sequence(const struct sk_buff *skb, __u16 *mss);
 #endif
 /* tcp_output.c */
 
@@ -1709,7 +1708,7 @@ struct tcp_request_sock_ops {
 const struct sock *sk_listener,
 struct sk_buff *skb);
 #ifdef CONFIG_SYN_COOKIES
-   __u32 (*cookie_init_seq)(struct sock *sk, const struct sk_buff *skb,
+   __u32 (*cookie_init_seq)(const struct sk_buff *skb,
 __u16 *mss);
 #endif
struct dst_entry *(*route_req)(struct sock *sk, struct flowi *fl,
@@ -1725,14 +1724,16 @@ struct tcp_request_sock_ops {
 
 #ifdef CONFIG_SYN_COOKIES
 static inline __u32 cookie_init_sequence(const struct tcp_request_sock_ops 
*ops,
-struct sock *sk, struct sk_buff *skb,
+const struct sock *sk, struct sk_buff 
*skb,
 __u16 *mss)
 {
-   return ops->cookie_init_seq(sk, skb, mss);
+   tcp_synq_overflow(sk);
+   NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_SYNCOOKIESSENT);
+   return ops->cookie_init_seq(skb, mss);
 }
 #else
 static inline __u32 cookie_init_sequence(const struct tcp_request_sock_ops 
*ops,
-struct sock *sk, struct sk_buff *skb,
+const struct sock *sk, struct sk_buff 
*skb,
 __u16 *mss)
 {
return 0;
diff --git a/net/ipv4/syncookies.c b/net/ipv4/syncookies.c
index 6595affded20..6b97b5f6457c 100644
--- a/net/ipv4/syncookies.c
+++ b/net/ipv4/syncookies.c
@@ -192,15 +192,11 @@ u32 __cookie_v4_init_sequence(const struct iphdr *iph, 
const struct tcphdr *th,
 }
 EXPORT_SYMBOL_GPL(__cookie_v4_init_sequence);
 
-__u32 cookie_v4_init_sequence(struct sock *sk, const struct sk_buff *skb,
- __u16 *mssp)
+__u32 cookie_v4_init_sequence(const struct sk_buff *skb, __u16 *mssp)
 {
const struct iphdr *iph = ip_hdr(skb);
const struct tcphdr *th = tcp_hdr(skb);
 
-   tcp_synq_overflow(sk);
-   NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_SYNCOOKIESSENT);
-
return __cookie_v4_init_sequence(iph, th, mssp);
 }
 
diff --git a/net/ipv6/syncookies.c b/net/ipv6/syncookies.c
index 2461b3ff9551..7606eba83e7b 100644
--- a/net/ipv6/syncookies.c
+++ b/net/ipv6/syncookies.c
@@ -114,14 +114,11 @@ u32 __cookie_v6_init_sequence(const struct ipv6hdr *iph,
 }
 EXPORT_SYMBOL_GPL(__cookie_v6_init_sequence);
 
-__u32 cookie_v6_init_sequence(struct sock *sk, const struct sk_buff *skb, 
__u16 *mssp)
+__u32 cookie_v6_init_sequence(const struct sk_buff *skb, __u16 *mssp)
 {
const struct ipv6hdr *iph = ipv6_hdr(skb);
const struct tcphdr *th = tcp_hdr(skb);
 
-   tcp_synq_overflow(sk);
-   NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_SYNCOOKIESSENT);
-
return __cookie_v6_init_sequence(iph, th, mssp);
 }
 
-- 
2.6.0.rc2.230.g3dd15c0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  

[PATCH net-next 09/14] tcp: constify tcp_create_openreq_child() socket argument

2015-09-29 Thread Eric Dumazet
This method does not touch the listener socket.

Signed-off-by: Eric Dumazet 
---
 include/net/tcp.h| 2 +-
 net/ipv4/tcp_minisocks.c | 4 +++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 1fe0bd458cb4..85995c1291d0 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -450,7 +450,7 @@ void tcp_v4_send_check(struct sock *sk, struct sk_buff 
*skb);
 void tcp_v4_mtu_reduced(struct sock *sk);
 void tcp_req_err(struct sock *sk, u32 seq);
 int tcp_v4_conn_request(struct sock *sk, struct sk_buff *skb);
-struct sock *tcp_create_openreq_child(struct sock *sk,
+struct sock *tcp_create_openreq_child(const struct sock *sk,
  struct request_sock *req,
  struct sk_buff *skb);
 void tcp_ca_openreq_child(struct sock *sk, const struct dst_entry *dst);
diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
index 139668cc2347..897e34273ba3 100644
--- a/net/ipv4/tcp_minisocks.c
+++ b/net/ipv4/tcp_minisocks.c
@@ -441,7 +441,9 @@ EXPORT_SYMBOL_GPL(tcp_ca_openreq_child);
  * Actually, we could lots of memory writes here. tp of listening
  * socket contains all necessary default parameters.
  */
-struct sock *tcp_create_openreq_child(struct sock *sk, struct request_sock 
*req, struct sk_buff *skb)
+struct sock *tcp_create_openreq_child(const struct sock *sk,
+ struct request_sock *req,
+ struct sk_buff *skb)
 {
struct sock *newsk = inet_csk_clone_lock(sk, req, GFP_ATOMIC);
 
-- 
2.6.0.rc2.230.g3dd15c0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 10/14] tcp/dccp: constify syn_recv_sock() method sock argument

2015-09-29 Thread Eric Dumazet
We'll soon no longer hold listener socket lock, these
functions do not modify the socket in any way.

Signed-off-by: Eric Dumazet 
---
 include/net/inet_connection_sock.h | 2 +-
 include/net/tcp.h  | 2 +-
 net/dccp/dccp.h| 2 +-
 net/dccp/ipv4.c| 3 ++-
 net/dccp/ipv6.c| 5 +++--
 net/ipv4/tcp_ipv4.c| 2 +-
 net/ipv6/tcp_ipv6.c| 5 +++--
 7 files changed, 12 insertions(+), 9 deletions(-)

diff --git a/include/net/inet_connection_sock.h 
b/include/net/inet_connection_sock.h
index 187cef7e56d5..ee54f21a8113 100644
--- a/include/net/inet_connection_sock.h
+++ b/include/net/inet_connection_sock.h
@@ -41,7 +41,7 @@ struct inet_connection_sock_af_ops {
int (*rebuild_header)(struct sock *sk);
void(*sk_rx_dst_set)(struct sock *sk, const struct sk_buff 
*skb);
int (*conn_request)(struct sock *sk, struct sk_buff *skb);
-   struct sock *(*syn_recv_sock)(struct sock *sk, struct sk_buff *skb,
+   struct sock *(*syn_recv_sock)(const struct sock *sk, struct sk_buff 
*skb,
  struct request_sock *req,
  struct dst_entry *dst);
u16 net_header_len;
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 85995c1291d0..a1d2f5d6a430 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -454,7 +454,7 @@ struct sock *tcp_create_openreq_child(const struct sock *sk,
  struct request_sock *req,
  struct sk_buff *skb);
 void tcp_ca_openreq_child(struct sock *sk, const struct dst_entry *dst);
-struct sock *tcp_v4_syn_recv_sock(struct sock *sk, struct sk_buff *skb,
+struct sock *tcp_v4_syn_recv_sock(const struct sock *sk, struct sk_buff *skb,
  struct request_sock *req,
  struct dst_entry *dst);
 int tcp_v4_do_rcv(struct sock *sk, struct sk_buff *skb);
diff --git a/net/dccp/dccp.h b/net/dccp/dccp.h
index 2409619b7043..e1f823451565 100644
--- a/net/dccp/dccp.h
+++ b/net/dccp/dccp.h
@@ -276,7 +276,7 @@ struct sock *dccp_create_openreq_child(const struct sock 
*sk,
 
 int dccp_v4_do_rcv(struct sock *sk, struct sk_buff *skb);
 
-struct sock *dccp_v4_request_recv_sock(struct sock *sk, struct sk_buff *skb,
+struct sock *dccp_v4_request_recv_sock(const struct sock *sk, struct sk_buff 
*skb,
   struct request_sock *req,
   struct dst_entry *dst);
 struct sock *dccp_check_req(struct sock *sk, struct sk_buff *skb,
diff --git a/net/dccp/ipv4.c b/net/dccp/ipv4.c
index 00a14fa4270a..5b7818c63cec 100644
--- a/net/dccp/ipv4.c
+++ b/net/dccp/ipv4.c
@@ -390,7 +390,8 @@ static inline u64 dccp_v4_init_sequence(const struct 
sk_buff *skb)
  *
  * This is the equivalent of TCP's tcp_v4_syn_recv_sock
  */
-struct sock *dccp_v4_request_recv_sock(struct sock *sk, struct sk_buff *skb,
+struct sock *dccp_v4_request_recv_sock(const struct sock *sk,
+  struct sk_buff *skb,
   struct request_sock *req,
   struct dst_entry *dst)
 {
diff --git a/net/dccp/ipv6.c b/net/dccp/ipv6.c
index 0966bc08d362..e8753aa3b7a4 100644
--- a/net/dccp/ipv6.c
+++ b/net/dccp/ipv6.c
@@ -408,13 +408,14 @@ drop:
return -1;
 }
 
-static struct sock *dccp_v6_request_recv_sock(struct sock *sk,
+static struct sock *dccp_v6_request_recv_sock(const struct sock *sk,
  struct sk_buff *skb,
  struct request_sock *req,
  struct dst_entry *dst)
 {
struct inet_request_sock *ireq = inet_rsk(req);
-   struct ipv6_pinfo *newnp, *np = inet6_sk(sk);
+   struct ipv6_pinfo *newnp;
+   const struct ipv6_pinfo *np = inet6_sk(sk);
struct inet_sock *newinet;
struct dccp6_sock *newdp6;
struct sock *newsk;
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 67c0dc8bddbf..ee0239e190cf 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1242,7 +1242,7 @@ EXPORT_SYMBOL(tcp_v4_conn_request);
  * The three way handshake has completed - we got a valid synack -
  * now create the new socket.
  */
-struct sock *tcp_v4_syn_recv_sock(struct sock *sk, struct sk_buff *skb,
+struct sock *tcp_v4_syn_recv_sock(const struct sock *sk, struct sk_buff *skb,
  struct request_sock *req,
  struct dst_entry *dst)
 {
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 092a23ef1feb..2330c7be6323 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -986,12 +986,13 @@ drop:
return 0; /* don't send reset */
 }
 
-static struct sock *tcp_v6_syn_recv_sock(struct sock *sk, 

[PATCH net-next 12/14] tcp: constify tcp_v{4|6}_route_req() sock argument

2015-09-29 Thread Eric Dumazet
These functions do not change the listener socket.
Goal is to make sure tcp_conn_request() is not messing with
listener in a racy way.

Signed-off-by: Eric Dumazet 
---
 include/net/tcp.h   | 2 +-
 net/ipv4/tcp_ipv4.c | 3 ++-
 net/ipv6/tcp_ipv6.c | 3 ++-
 3 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 5aa6672c6f5b..2c7dfe52f473 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1711,7 +1711,7 @@ struct tcp_request_sock_ops {
__u32 (*cookie_init_seq)(const struct sk_buff *skb,
 __u16 *mss);
 #endif
-   struct dst_entry *(*route_req)(struct sock *sk, struct flowi *fl,
+   struct dst_entry *(*route_req)(const struct sock *sk, struct flowi *fl,
   const struct request_sock *req,
   bool *strict);
__u32 (*init_seq)(const struct sk_buff *skb);
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index ee0239e190cf..f551e9e862db 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1180,7 +1180,8 @@ static void tcp_v4_init_req(struct request_sock *req,
ireq->opt = tcp_v4_save_options(skb);
 }
 
-static struct dst_entry *tcp_v4_route_req(struct sock *sk, struct flowi *fl,
+static struct dst_entry *tcp_v4_route_req(const struct sock *sk,
+ struct flowi *fl,
  const struct request_sock *req,
  bool *strict)
 {
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 2330c7be6323..97bc26e0cd0f 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -689,7 +689,8 @@ static void tcp_v6_init_req(struct request_sock *req,
}
 }
 
-static struct dst_entry *tcp_v6_route_req(struct sock *sk, struct flowi *fl,
+static struct dst_entry *tcp_v6_route_req(const struct sock *sk,
+ struct flowi *fl,
  const struct request_sock *req,
  bool *strict)
 {
-- 
2.6.0.rc2.230.g3dd15c0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 2/6] net: switchdev: move dev in switchdev_fdb_dump

2015-09-29 Thread Vivien Didelot
The FDB dump callback requires the related net_device so move it to the
struct switchdev_fdb_dump superset instead of using a callback param.

With this done, it'll be simpler to change the dump function signature.

Signed-off-by: Vivien Didelot 
---
 net/switchdev/switchdev.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
index 56d34ed..c0e2047 100644
--- a/net/switchdev/switchdev.c
+++ b/net/switchdev/switchdev.c
@@ -858,6 +858,7 @@ EXPORT_SYMBOL_GPL(switchdev_port_fdb_del);
 
 struct switchdev_fdb_dump {
struct switchdev_obj obj;
+   struct net_device *dev;
struct sk_buff *skb;
struct netlink_callback *cb;
int idx;
@@ -887,7 +888,7 @@ static int switchdev_port_fdb_dump_cb(struct net_device 
*dev,
ndm->ndm_pad2= 0;
ndm->ndm_flags   = NTF_SELF;
ndm->ndm_type= 0;
-   ndm->ndm_ifindex = dev->ifindex;
+   ndm->ndm_ifindex = dump->dev->ifindex;
ndm->ndm_state   = obj->u.fdb.ndm_state;
 
if (nla_put(dump->skb, NDA_LLADDR, ETH_ALEN, obj->u.fdb.addr))
@@ -927,6 +928,7 @@ int switchdev_port_fdb_dump(struct sk_buff *skb, struct 
netlink_callback *cb,
.id = SWITCHDEV_OBJ_PORT_FDB,
.cb = switchdev_port_fdb_dump_cb,
},
+   .dev = dev,
.skb = skb,
.cb = cb,
.idx = idx,
-- 
2.5.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/1] net sysfs: Print link speed as signed integer

2015-09-29 Thread Jeremy Harris

On 28/09/15 14:05, Alexander Stein wrote:

Otherwise 4294967295 (MBit/s) (-1) will be printed when there is no link.
Documentation/ABI/testing/sysfs-class-net does not state if this shall be
signed or unsigned.
Also remove the now unused variable fmt_udec.

[...]

-   ret = sprintf(buf, fmt_udec, ethtool_cmd_speed());
+   ret = sprintf(buf, fmt_dec, ethtool_cmd_speed());


If we print anything numeric, why is zero not appropriate
(which would still be unsigned)?
--
Cheers,
Jeremy
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 3/6] net: switchdev: remove dev from switchdev_obj cb

2015-09-29 Thread Vivien Didelot
The net_device associated to a dump operation does not have to be passed
to the callback. switchdev stores it in a superset struct, if needed.

Also some drivers (such as DSA drivers) may not have easy access to it.

This will simplify pushing the callback function down to the drivers.

Signed-off-by: Vivien Didelot 
---
 drivers/net/ethernet/rocker/rocker.c | 4 ++--
 include/net/switchdev.h  | 2 +-
 net/dsa/slave.c  | 4 ++--
 net/switchdev/switchdev.c| 6 ++
 4 files changed, 7 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/rocker/rocker.c 
b/drivers/net/ethernet/rocker/rocker.c
index d3f6632..78fd443 100644
--- a/drivers/net/ethernet/rocker/rocker.c
+++ b/drivers/net/ethernet/rocker/rocker.c
@@ -4556,7 +4556,7 @@ static int rocker_port_fdb_dump(const struct rocker_port 
*rocker_port,
fdb->ndm_state = NUD_REACHABLE;
fdb->vid = rocker_port_vlan_to_vid(rocker_port,
   found->key.vlan_id);
-   err = obj->cb(rocker_port->dev, obj);
+   err = obj->cb(obj);
if (err)
break;
}
@@ -4579,7 +4579,7 @@ static int rocker_port_vlan_dump(const struct rocker_port 
*rocker_port,
if (rocker_vlan_id_is_internal(htons(vid)))
vlan->flags |= BRIDGE_VLAN_INFO_PVID;
vlan->vid_begin = vlan->vid_end = vid;
-   err = obj->cb(rocker_port->dev, obj);
+   err = obj->cb(obj);
if (err)
break;
}
diff --git a/include/net/switchdev.h b/include/net/switchdev.h
index 1820787..9ef7c56 100644
--- a/include/net/switchdev.h
+++ b/include/net/switchdev.h
@@ -66,7 +66,7 @@ enum switchdev_obj_id {
 
 struct switchdev_obj {
enum switchdev_obj_id id;
-   int (*cb)(struct net_device *dev, struct switchdev_obj *obj);
+   int (*cb)(struct switchdev_obj *obj);
union {
struct switchdev_obj_vlan { /* PORT_VLAN */
u16 flags;
diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index f18cae5..0b47647 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -334,7 +334,7 @@ static int dsa_slave_port_vlan_dump(struct net_device *dev,
if (test_bit(p->port, untagged))
vlan->flags |= BRIDGE_VLAN_INFO_UNTAGGED;
 
-   err = obj->cb(dev, obj);
+   err = obj->cb(obj);
if (err)
break;
}
@@ -397,7 +397,7 @@ static int dsa_slave_port_fdb_dump(struct net_device *dev,
obj->u.fdb.vid = vid;
obj->u.fdb.ndm_state = is_static ? NUD_NOARP : NUD_REACHABLE;
 
-   ret = obj->cb(dev, obj);
+   ret = obj->cb(obj);
if (ret < 0)
break;
}
diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
index c0e2047..93f4971 100644
--- a/net/switchdev/switchdev.c
+++ b/net/switchdev/switchdev.c
@@ -514,8 +514,7 @@ static int switchdev_port_vlan_dump_put(struct 
switchdev_vlan_dump *dump)
return 0;
 }
 
-static int switchdev_port_vlan_dump_cb(struct net_device *dev,
-  struct switchdev_obj *obj)
+static int switchdev_port_vlan_dump_cb(struct switchdev_obj *obj)
 {
struct switchdev_vlan_dump *dump =
container_of(obj, struct switchdev_vlan_dump, obj);
@@ -864,8 +863,7 @@ struct switchdev_fdb_dump {
int idx;
 };
 
-static int switchdev_port_fdb_dump_cb(struct net_device *dev,
- struct switchdev_obj *obj)
+static int switchdev_port_fdb_dump_cb(struct switchdev_obj *obj)
 {
struct switchdev_fdb_dump *dump =
container_of(obj, struct switchdev_fdb_dump, obj);
-- 
2.5.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 1/6] net: switchdev: remove dev in port_vlan_dump_put

2015-09-29 Thread Vivien Didelot
The static switchdev_port_vlan_dump_put function don't need the
net_device parameter, so remove it.

Signed-off-by: Vivien Didelot 
---
 net/switchdev/switchdev.c | 11 +--
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
index 00ee547..56d34ed 100644
--- a/net/switchdev/switchdev.c
+++ b/net/switchdev/switchdev.c
@@ -484,8 +484,7 @@ struct switchdev_vlan_dump {
u16 end;
 };
 
-static int switchdev_port_vlan_dump_put(struct net_device *dev,
-   struct switchdev_vlan_dump *dump)
+static int switchdev_port_vlan_dump_put(struct switchdev_vlan_dump *dump)
 {
struct bridge_vlan_info vinfo;
 
@@ -531,7 +530,7 @@ static int switchdev_port_vlan_dump_cb(struct net_device 
*dev,
for (dump->begin = dump->end = vlan->vid_begin;
 dump->begin <= vlan->vid_end;
 dump->begin++, dump->end++) {
-   err = switchdev_port_vlan_dump_put(dev, dump);
+   err = switchdev_port_vlan_dump_put(dump);
if (err)
return err;
}
@@ -543,7 +542,7 @@ static int switchdev_port_vlan_dump_cb(struct net_device 
*dev,
/* prepend */
dump->begin = vlan->vid_begin;
} else {
-   err = switchdev_port_vlan_dump_put(dev, dump);
+   err = switchdev_port_vlan_dump_put(dump);
dump->flags = vlan->flags;
dump->begin = vlan->vid_begin;
dump->end = vlan->vid_end;
@@ -555,7 +554,7 @@ static int switchdev_port_vlan_dump_cb(struct net_device 
*dev,
/* append */
dump->end = vlan->vid_end;
} else {
-   err = switchdev_port_vlan_dump_put(dev, dump);
+   err = switchdev_port_vlan_dump_put(dump);
dump->flags = vlan->flags;
dump->begin = vlan->vid_begin;
dump->end = vlan->vid_end;
@@ -588,7 +587,7 @@ static int switchdev_port_vlan_fill(struct sk_buff *skb, 
struct net_device *dev,
goto err_out;
if (filter_mask & RTEXT_FILTER_BRVLAN_COMPRESSED)
/* last one */
-   err = switchdev_port_vlan_dump_put(dev, );
+   err = switchdev_port_vlan_dump_put();
}
 
 err_out:
-- 
2.5.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 06/14] inet: constify __inet_inherit_port() sock argument

2015-09-29 Thread Eric Dumazet
socket is not touched, make it const.

Signed-off-by: Eric Dumazet 
---
 include/net/inet_hashtables.h | 2 +-
 net/ipv4/inet_hashtables.c| 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/net/inet_hashtables.h b/include/net/inet_hashtables.h
index b07d126694a7..3fb778d7c875 100644
--- a/include/net/inet_hashtables.h
+++ b/include/net/inet_hashtables.h
@@ -199,7 +199,7 @@ static inline int inet_sk_listen_hashfn(const struct sock 
*sk)
 }
 
 /* Caller must disable local BH processing. */
-int __inet_inherit_port(struct sock *sk, struct sock *child);
+int __inet_inherit_port(const struct sock *sk, struct sock *child);
 
 void inet_put_port(struct sock *sk);
 
diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
index 89120196a949..56742e995dd3 100644
--- a/net/ipv4/inet_hashtables.c
+++ b/net/ipv4/inet_hashtables.c
@@ -126,7 +126,7 @@ void inet_put_port(struct sock *sk)
 }
 EXPORT_SYMBOL(inet_put_port);
 
-int __inet_inherit_port(struct sock *sk, struct sock *child)
+int __inet_inherit_port(const struct sock *sk, struct sock *child)
 {
struct inet_hashinfo *table = sk->sk_prot->h.hashinfo;
unsigned short port = inet_sk(child)->inet_num;
-- 
2.6.0.rc2.230.g3dd15c0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 05/14] inet: constify inet_csk_route_child_sock() socket argument

2015-09-29 Thread Eric Dumazet
The socket points to the (shared) listener.

Signed-off-by: Eric Dumazet 
---
 include/net/inet_connection_sock.h | 3 ++-
 net/ipv4/inet_connection_sock.c| 2 +-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/include/net/inet_connection_sock.h 
b/include/net/inet_connection_sock.h
index 00c3ced6ee55..187cef7e56d5 100644
--- a/include/net/inet_connection_sock.h
+++ b/include/net/inet_connection_sock.h
@@ -268,7 +268,8 @@ int inet_csk_get_port(struct sock *sk, unsigned short snum);
 
 struct dst_entry *inet_csk_route_req(const struct sock *sk, struct flowi4 *fl4,
 const struct request_sock *req);
-struct dst_entry *inet_csk_route_child_sock(struct sock *sk, struct sock 
*newsk,
+struct dst_entry *inet_csk_route_child_sock(const struct sock *sk,
+   struct sock *newsk,
const struct request_sock *req);
 
 static inline void inet_csk_reqsk_queue_add(struct sock *sk,
diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index ba2f90d90cb5..694a5e8f4f9f 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -439,7 +439,7 @@ no_route:
 }
 EXPORT_SYMBOL_GPL(inet_csk_route_req);
 
-struct dst_entry *inet_csk_route_child_sock(struct sock *sk,
+struct dst_entry *inet_csk_route_child_sock(const struct sock *sk,
struct sock *newsk,
const struct request_sock *req)
 {
-- 
2.6.0.rc2.230.g3dd15c0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 08/14] dccp: constify dccp_create_openreq_child() sock argument

2015-09-29 Thread Eric Dumazet
socket no longer needs to be read/write

Signed-off-by: Eric Dumazet 
---
 net/dccp/dccp.h  | 2 +-
 net/dccp/minisocks.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/dccp/dccp.h b/net/dccp/dccp.h
index 8ed1df2771bd..2409619b7043 100644
--- a/net/dccp/dccp.h
+++ b/net/dccp/dccp.h
@@ -270,7 +270,7 @@ int dccp_reqsk_init(struct request_sock *rq, struct 
dccp_sock const *dp,
 
 int dccp_v4_conn_request(struct sock *sk, struct sk_buff *skb);
 
-struct sock *dccp_create_openreq_child(struct sock *sk,
+struct sock *dccp_create_openreq_child(const struct sock *sk,
   const struct request_sock *req,
   const struct sk_buff *skb);
 
diff --git a/net/dccp/minisocks.c b/net/dccp/minisocks.c
index 9bfd0dc1e6cb..d10aace43672 100644
--- a/net/dccp/minisocks.c
+++ b/net/dccp/minisocks.c
@@ -72,7 +72,7 @@ void dccp_time_wait(struct sock *sk, int state, int timeo)
dccp_done(sk);
 }
 
-struct sock *dccp_create_openreq_child(struct sock *sk,
+struct sock *dccp_create_openreq_child(const struct sock *sk,
   const struct request_sock *req,
   const struct sk_buff *skb)
 {
-- 
2.6.0.rc2.230.g3dd15c0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 07/14] net: constify sk_gfp_atomic() sock argument

2015-09-29 Thread Eric Dumazet
Signed-off-by: Eric Dumazet 
---
 include/net/sock.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index 94dff7f566f5..dfe2eb8e1132 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -759,7 +759,7 @@ static inline int sk_memalloc_socks(void)
 
 #endif
 
-static inline gfp_t sk_gfp_atomic(struct sock *sk, gfp_t gfp_mask)
+static inline gfp_t sk_gfp_atomic(const struct sock *sk, gfp_t gfp_mask)
 {
return GFP_ATOMIC | (sk->sk_allocation & __GFP_MEMALLOC);
 }
-- 
2.6.0.rc2.230.g3dd15c0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 02/14] tcp: remove unused len argument from tcp_rcv_state_process()

2015-09-29 Thread Eric Dumazet
Once we realize tcp_rcv_synsent_state_process() does not use
its 'len' argument and we get rid of it, then it becomes clear
this argument is no longer used in tcp_rcv_state_process()

Signed-off-by: Eric Dumazet 
---
 include/net/tcp.h| 2 +-
 net/ipv4/tcp_input.c | 6 +++---
 net/ipv4/tcp_ipv4.c  | 2 +-
 net/ipv4/tcp_minisocks.c | 3 +--
 net/ipv6/tcp_ipv6.c  | 2 +-
 5 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index cdbf63d3c5cf..1cfdedbe47e1 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -366,7 +366,7 @@ void tcp_write_timer_handler(struct sock *sk);
 void tcp_delack_timer_handler(struct sock *sk);
 int tcp_ioctl(struct sock *sk, int cmd, unsigned long arg);
 int tcp_rcv_state_process(struct sock *sk, struct sk_buff *skb,
- const struct tcphdr *th, unsigned int len);
+ const struct tcphdr *th);
 void tcp_rcv_established(struct sock *sk, struct sk_buff *skb,
 const struct tcphdr *th, unsigned int len);
 void tcp_rcv_space_adjust(struct sock *sk);
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 4964d53907e9..dcbddf12f4b3 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -5472,7 +5472,7 @@ static bool tcp_rcv_fastopen_synack(struct sock *sk, 
struct sk_buff *synack,
 }
 
 static int tcp_rcv_synsent_state_process(struct sock *sk, struct sk_buff *skb,
-const struct tcphdr *th, unsigned int 
len)
+const struct tcphdr *th)
 {
struct inet_connection_sock *icsk = inet_csk(sk);
struct tcp_sock *tp = tcp_sk(sk);
@@ -5699,7 +5699,7 @@ reset_and_undo:
  */
 
 int tcp_rcv_state_process(struct sock *sk, struct sk_buff *skb,
- const struct tcphdr *th, unsigned int len)
+ const struct tcphdr *th)
 {
struct tcp_sock *tp = tcp_sk(sk);
struct inet_connection_sock *icsk = inet_csk(sk);
@@ -5749,7 +5749,7 @@ int tcp_rcv_state_process(struct sock *sk, struct sk_buff 
*skb,
goto discard;
 
case TCP_SYN_SENT:
-   queued = tcp_rcv_synsent_state_process(sk, skb, th, len);
+   queued = tcp_rcv_synsent_state_process(sk, skb, th);
if (queued >= 0)
return queued;
 
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 4300d0132b9f..7e5ae1e01009 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1420,7 +1420,7 @@ int tcp_v4_do_rcv(struct sock *sk, struct sk_buff *skb)
} else
sock_rps_save_rxhash(sk, skb);
 
-   if (tcp_rcv_state_process(sk, skb, tcp_hdr(skb), skb->len)) {
+   if (tcp_rcv_state_process(sk, skb, tcp_hdr(skb))) {
rsk = sk;
goto reset;
}
diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
index e4fe62b6b106..9c7c61cf7462 100644
--- a/net/ipv4/tcp_minisocks.c
+++ b/net/ipv4/tcp_minisocks.c
@@ -821,8 +821,7 @@ int tcp_child_process(struct sock *parent, struct sock 
*child,
int state = child->sk_state;
 
if (!sock_owned_by_user(child)) {
-   ret = tcp_rcv_state_process(child, skb, tcp_hdr(skb),
-   skb->len);
+   ret = tcp_rcv_state_process(child, skb, tcp_hdr(skb));
/* Wakeup parent, send SIGIO */
if (state == TCP_SYN_RECV && child->sk_state != state)
parent->sk_data_ready(parent);
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index c47e5c87a2a8..b6e473f0f62e 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -1272,7 +1272,7 @@ static int tcp_v6_do_rcv(struct sock *sk, struct sk_buff 
*skb)
} else
sock_rps_save_rxhash(sk, skb);
 
-   if (tcp_rcv_state_process(sk, skb, tcp_hdr(skb), skb->len))
+   if (tcp_rcv_state_process(sk, skb, tcp_hdr(skb)))
goto reset;
if (opt_skb)
goto ipv6_pktoptions;
-- 
2.6.0.rc2.230.g3dd15c0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 01/14] tcp/dccp: constify send_synack and send_reset socket argument

2015-09-29 Thread Eric Dumazet
None of these functions need to change the socket, make it
const.

Signed-off-by: Eric Dumazet 
---
 include/net/request_sock.h |  4 ++--
 net/dccp/dccp.h|  2 +-
 net/dccp/ipv4.c|  2 +-
 net/dccp/ipv6.c|  2 +-
 net/dccp/minisocks.c   |  2 +-
 net/ipv4/tcp_ipv4.c|  4 ++--
 net/ipv6/tcp_ipv6.c| 12 ++--
 7 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/include/net/request_sock.h b/include/net/request_sock.h
index 181f97f9fe1c..90247ec7955b 100644
--- a/include/net/request_sock.h
+++ b/include/net/request_sock.h
@@ -34,9 +34,9 @@ struct request_sock_ops {
char*slab_name;
int (*rtx_syn_ack)(const struct sock *sk,
   struct request_sock *req);
-   void(*send_ack)(struct sock *sk, struct sk_buff *skb,
+   void(*send_ack)(const struct sock *sk, struct sk_buff *skb,
struct request_sock *req);
-   void(*send_reset)(struct sock *sk,
+   void(*send_reset)(const struct sock *sk,
  struct sk_buff *skb);
void(*destructor)(struct request_sock *req);
void(*syn_ack_timeout)(const struct request_sock *req);
diff --git a/net/dccp/dccp.h b/net/dccp/dccp.h
index 31e96df500d1..8ed1df2771bd 100644
--- a/net/dccp/dccp.h
+++ b/net/dccp/dccp.h
@@ -229,7 +229,7 @@ void dccp_v4_send_check(struct sock *sk, struct sk_buff 
*skb);
 int dccp_retransmit_skb(struct sock *sk);
 
 void dccp_send_ack(struct sock *sk);
-void dccp_reqsk_send_ack(struct sock *sk, struct sk_buff *skb,
+void dccp_reqsk_send_ack(const struct sock *sk, struct sk_buff *skb,
 struct request_sock *rsk);
 
 void dccp_send_sync(struct sock *sk, const u64 seq,
diff --git a/net/dccp/ipv4.c b/net/dccp/ipv4.c
index a46ae9c69ccf..00a14fa4270a 100644
--- a/net/dccp/ipv4.c
+++ b/net/dccp/ipv4.c
@@ -527,7 +527,7 @@ out:
return err;
 }
 
-static void dccp_v4_ctl_send_reset(struct sock *sk, struct sk_buff *rxskb)
+static void dccp_v4_ctl_send_reset(const struct sock *sk, struct sk_buff 
*rxskb)
 {
int err;
const struct iphdr *rxiph;
diff --git a/net/dccp/ipv6.c b/net/dccp/ipv6.c
index 4fa199dc69a3..aa719e700961 100644
--- a/net/dccp/ipv6.c
+++ b/net/dccp/ipv6.c
@@ -234,7 +234,7 @@ static void dccp_v6_reqsk_destructor(struct request_sock 
*req)
kfree_skb(inet_rsk(req)->pktopts);
 }
 
-static void dccp_v6_ctl_send_reset(struct sock *sk, struct sk_buff *rxskb)
+static void dccp_v6_ctl_send_reset(const struct sock *sk, struct sk_buff 
*rxskb)
 {
const struct ipv6hdr *rxip6h;
struct sk_buff *skb;
diff --git a/net/dccp/minisocks.c b/net/dccp/minisocks.c
index 838f524cf11a..9bfd0dc1e6cb 100644
--- a/net/dccp/minisocks.c
+++ b/net/dccp/minisocks.c
@@ -236,7 +236,7 @@ int dccp_child_process(struct sock *parent, struct sock 
*child,
 
 EXPORT_SYMBOL_GPL(dccp_child_process);
 
-void dccp_reqsk_send_ack(struct sock *sk, struct sk_buff *skb,
+void dccp_reqsk_send_ack(const struct sock *sk, struct sk_buff *skb,
 struct request_sock *rsk)
 {
DCCP_BUG("DCCP-ACK packets are never sent in LISTEN/RESPOND state");
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index a23ba7daecbf..4300d0132b9f 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -576,7 +576,7 @@ EXPORT_SYMBOL(tcp_v4_send_check);
  * Exception: precedence violation. We do not implement it in any case.
  */
 
-static void tcp_v4_send_reset(struct sock *sk, struct sk_buff *skb)
+static void tcp_v4_send_reset(const struct sock *sk, struct sk_buff *skb)
 {
const struct tcphdr *th = tcp_hdr(skb);
struct {
@@ -795,7 +795,7 @@ static void tcp_v4_timewait_ack(struct sock *sk, struct 
sk_buff *skb)
inet_twsk_put(tw);
 }
 
-static void tcp_v4_reqsk_send_ack(struct sock *sk, struct sk_buff *skb,
+static void tcp_v4_reqsk_send_ack(const struct sock *sk, struct sk_buff *skb,
  struct request_sock *req)
 {
/* sk->sk_state == TCP_LISTEN -> for regular TCP_SYN_RECV
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 16fb299dcab8..c47e5c87a2a8 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -70,8 +70,8 @@
 #include 
 #include 
 
-static voidtcp_v6_send_reset(struct sock *sk, struct sk_buff *skb);
-static voidtcp_v6_reqsk_send_ack(struct sock *sk, struct sk_buff *skb,
+static voidtcp_v6_send_reset(const struct sock *sk, struct sk_buff *skb);
+static voidtcp_v6_reqsk_send_ack(const struct sock *sk, struct sk_buff 
*skb,
  struct request_sock *req);
 
 static int tcp_v6_do_rcv(struct sock *sk, struct sk_buff *skb);
@@ -724,7 +724,7 @@ static const struct tcp_request_sock_ops 
tcp_request_sock_ipv6_ops = {
.queue_hash_add =   

[PATCH net-next 6/6] net: switchdev: extract struct switchdev_obj_*

2015-09-29 Thread Vivien Didelot
Now that switchdev and its drivers directly use specific switchdev_obj_*
structures, move them out of the switchdev_obj union and get rif of this
outer structure.

Signed-off-by: Vivien Didelot 
---
 include/net/switchdev.h | 53 -
 1 file changed, 26 insertions(+), 27 deletions(-)

diff --git a/include/net/switchdev.h b/include/net/switchdev.h
index 230fcfc..8a3bacc 100644
--- a/include/net/switchdev.h
+++ b/include/net/switchdev.h
@@ -64,30 +64,29 @@ enum switchdev_obj_id {
SWITCHDEV_OBJ_PORT_FDB,
 };
 
-struct switchdev_obj {
-   enum switchdev_obj_id id;
-   int (*cb)(struct switchdev_obj *obj);
-   union {
-   struct switchdev_obj_vlan { /* PORT_VLAN */
-   u16 flags;
-   u16 vid_begin;
-   u16 vid_end;
-   } vlan;
-   struct switchdev_obj_ipv4_fib { /* IPV4_FIB */
-   u32 dst;
-   int dst_len;
-   struct fib_info *fi;
-   u8 tos;
-   u8 type;
-   u32 nlflags;
-   u32 tb_id;
-   } ipv4_fib;
-   struct switchdev_obj_fdb {  /* PORT_FDB */
-   const unsigned char *addr;
-   u16 vid;
-   u16 ndm_state;
-   } fdb;
-   } u;
+/* SWITCHDEV_OBJ_PORT_VLAN */
+struct switchdev_obj_vlan {
+   u16 flags;
+   u16 vid_begin;
+   u16 vid_end;
+};
+
+/* SWITCHDEV_OBJ_IPV4_FIB */
+struct switchdev_obj_ipv4_fib {
+   u32 dst;
+   int dst_len;
+   struct fib_info *fi;
+   u8 tos;
+   u8 type;
+   u32 nlflags;
+   u32 tb_id;
+};
+
+/* SWITCHDEV_OBJ_PORT_FDB */
+struct switchdev_obj_fdb {
+   const unsigned char *addr;
+   u16 vid;
+   u16 ndm_state;
 };
 
 void switchdev_trans_item_enqueue(struct switchdev_trans *trans,
@@ -102,11 +101,11 @@ void *switchdev_trans_item_dequeue(struct switchdev_trans 
*trans);
  *
  * @switchdev_port_attr_set: Set a port attribute (see switchdev_attr).
  *
- * @switchdev_port_obj_add: Add an object to port (see switchdev_obj).
+ * @switchdev_port_obj_add: Add an object to port (see switchdev_obj_*).
  *
- * @switchdev_port_obj_del: Delete an object from port (see switchdev_obj).
+ * @switchdev_port_obj_del: Delete an object from port (see switchdev_obj_*).
  *
- * @switchdev_port_obj_dump: Dump port objects (see switchdev_obj).
+ * @switchdev_port_obj_dump: Dump port objects (see switchdev_obj_*).
  */
 struct switchdev_ops {
int (*switchdev_port_attr_get)(struct net_device *dev,
-- 
2.5.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] net/mlx4: Handle return codes in mlx4_qp_attach_common

2015-09-29 Thread Robb Manes
Both new_steering_entry() and existing_steering_entry() return values
based on their success or failure, but currently they fall through
silently.  This can make troubleshooting difficult, as we were unable
to tell which one of these two functions returned errors or
specifically what code was returned.  This patch remedies that
situation by passing the return codes to err, which is returned by
mlx4_qp_attach_common() itself.

This also addresses a leak in the call to mlx4_bitmap_free() as well.

Signed-off-by: Robb Manes 
---
Sorry about the poor formatting; I should have used git-send properly.

 drivers/net/ethernet/mellanox/mlx4/mcg.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/mcg.c 
b/drivers/net/ethernet/mellanox/mlx4/mcg.c
index bd9ea0d..1d4e2e0 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mcg.c
+++ b/drivers/net/ethernet/mellanox/mlx4/mcg.c
@@ -1184,10 +1184,11 @@ out:
if (prot == MLX4_PROT_ETH) {
/* manage the steering entry for promisc mode */
if (new_entry)
-   new_steering_entry(dev, port, steer, index, qp->qpn);
+   err = new_steering_entry(dev, port, steer,
+index, qp->qpn);
else
-   existing_steering_entry(dev, port, steer,
-   index, qp->qpn);
+   err = existing_steering_entry(dev, port, steer,
+ index, qp->qpn);
}
if (err && link && index != -1) {
if (index < dev->caps.num_mgms)
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


question about potential integer truncation in mwifiex_set_wapi_ie and mwifiex_set_wps_ie

2015-09-29 Thread PaX Team
hi all,

in drivers/net/wireless/mwifiex/sta_ioctl.c the following functions

mwifiex_set_wpa_ie_helper
mwifiex_set_wapi_ie
mwifiex_set_wps_ie

can truncate the incoming ie_len argument from u16 to u8 when it gets
stored in mwifiex_private.wpa_ie_len, mwifiex_private.wapi_ie_len and
mwifiex_private.wps_ie_len, respectively. based on some light code
reading it seems a length value of 256 is valid (IEEE_MAX_IE_SIZE and
MWIFIEX_MAX_VSIE_LEN seem to limit it) and thus would get truncated
to 0 when stored in those u8 fields. the question is whether this is
intentional or a bug somewhere.

FTR, this issue was detected with the upcoming version of the size overflow
plugin we have in PaX/grsecurity and there're a handful of similar cases in
the tree where potentially unwanted or unnecessary integer truncations occur,
this being one of these. any opinion/help is welcome!

cheers,
  PaX Team

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 5/6] net: switchdev: abstract object in add/del ops

2015-09-29 Thread Vivien Didelot
Similar to the notifier_call callback of a notifier_block, change the
function signature of switchdev add and del operations to:

int switchdev_port_obj_add/del(struct net_device *dev,
   enum switchdev_obj_id id, void *obj);

This allows the caller to pass a specific switchdev_obj_* structure
instead of the generic switchdev_obj one.

Drivers implementation of these operations and switchdev have been
changed accordingly.

Signed-off-by: Vivien Didelot 
---
 drivers/net/ethernet/rocker/rocker.c |  21 +++---
 include/net/switchdev.h  |  18 --
 net/bridge/br_fdb.c  |  11 ++--
 net/bridge/br_vlan.c |  24 +++
 net/dsa/slave.c  |  20 +++---
 net/switchdev/switchdev.c| 122 ---
 6 files changed, 99 insertions(+), 117 deletions(-)

diff --git a/drivers/net/ethernet/rocker/rocker.c 
b/drivers/net/ethernet/rocker/rocker.c
index 107adb6..9773f5b 100644
--- a/drivers/net/ethernet/rocker/rocker.c
+++ b/drivers/net/ethernet/rocker/rocker.c
@@ -4437,26 +4437,25 @@ static int rocker_port_fdb_add(struct rocker_port 
*rocker_port,
 }
 
 static int rocker_port_obj_add(struct net_device *dev,
-  struct switchdev_obj *obj,
+  enum switchdev_obj_id id, const void *obj,
   struct switchdev_trans *trans)
 {
struct rocker_port *rocker_port = netdev_priv(dev);
const struct switchdev_obj_ipv4_fib *fib4;
int err = 0;
 
-   switch (obj->id) {
+   switch (id) {
case SWITCHDEV_OBJ_PORT_VLAN:
-   err = rocker_port_vlans_add(rocker_port, trans,
-   >u.vlan);
+   err = rocker_port_vlans_add(rocker_port, trans, obj);
break;
case SWITCHDEV_OBJ_IPV4_FIB:
-   fib4 = >u.ipv4_fib;
+   fib4 = obj;
err = rocker_port_fib_ipv4(rocker_port, trans,
   htonl(fib4->dst), fib4->dst_len,
   fib4->fi, fib4->tb_id, 0);
break;
case SWITCHDEV_OBJ_PORT_FDB:
-   err = rocker_port_fdb_add(rocker_port, trans, >u.fdb);
+   err = rocker_port_fdb_add(rocker_port, trans, obj);
break;
default:
err = -EOPNOTSUPP;
@@ -4509,25 +4508,25 @@ static int rocker_port_fdb_del(struct rocker_port 
*rocker_port,
 }
 
 static int rocker_port_obj_del(struct net_device *dev,
-  struct switchdev_obj *obj)
+  enum switchdev_obj_id id, const void *obj)
 {
struct rocker_port *rocker_port = netdev_priv(dev);
const struct switchdev_obj_ipv4_fib *fib4;
int err = 0;
 
-   switch (obj->id) {
+   switch (id) {
case SWITCHDEV_OBJ_PORT_VLAN:
-   err = rocker_port_vlans_del(rocker_port, >u.vlan);
+   err = rocker_port_vlans_del(rocker_port, obj);
break;
case SWITCHDEV_OBJ_IPV4_FIB:
-   fib4 = >u.ipv4_fib;
+   fib4 = obj;
err = rocker_port_fib_ipv4(rocker_port, NULL,
   htonl(fib4->dst), fib4->dst_len,
   fib4->fi, fib4->tb_id,
   ROCKER_OP_FLAG_REMOVE);
break;
case SWITCHDEV_OBJ_PORT_FDB:
-   err = rocker_port_fdb_del(rocker_port, NULL, >u.fdb);
+   err = rocker_port_fdb_del(rocker_port, NULL, obj);
break;
default:
err = -EOPNOTSUPP;
diff --git a/include/net/switchdev.h b/include/net/switchdev.h
index 0a80f2a..230fcfc 100644
--- a/include/net/switchdev.h
+++ b/include/net/switchdev.h
@@ -115,10 +115,12 @@ struct switchdev_ops {
   struct switchdev_attr *attr,
   struct switchdev_trans *trans);
int (*switchdev_port_obj_add)(struct net_device *dev,
- struct switchdev_obj *obj,
+ enum switchdev_obj_id id,
+ const void *obj,
  struct switchdev_trans *trans);
int (*switchdev_port_obj_del)(struct net_device *dev,
- struct switchdev_obj *obj);
+ enum switchdev_obj_id id,
+ const void *obj);
int (*switchdev_port_obj_dump)(struct net_device *dev,
   enum switchdev_obj_id id, void *obj,
   int (*cb)(void *obj));
@@ -151,8 +153,10 @@ int 

[PATCH v2 net-next 5/6] net: switchdev: abstract object in add/del ops

2015-09-29 Thread Vivien Didelot
Similar to the notifier_call callback of a notifier_block, change the
function signature of switchdev add and del operations to:

int switchdev_port_obj_add/del(struct net_device *dev,
   enum switchdev_obj_id id, void *obj);

This allows the caller to pass a specific switchdev_obj_* structure
instead of the generic switchdev_obj one.

Drivers implementation of these operations and switchdev have been
changed accordingly.

Signed-off-by: Vivien Didelot 
---
 drivers/net/ethernet/rocker/rocker.c |  21 +++---
 include/net/switchdev.h  |  18 --
 net/bridge/br_fdb.c  |  11 ++--
 net/bridge/br_vlan.c |  24 +++
 net/dsa/slave.c  |  20 +++---
 net/switchdev/switchdev.c| 122 ---
 6 files changed, 99 insertions(+), 117 deletions(-)

diff --git a/drivers/net/ethernet/rocker/rocker.c 
b/drivers/net/ethernet/rocker/rocker.c
index 107adb6..9773f5b 100644
--- a/drivers/net/ethernet/rocker/rocker.c
+++ b/drivers/net/ethernet/rocker/rocker.c
@@ -4437,26 +4437,25 @@ static int rocker_port_fdb_add(struct rocker_port 
*rocker_port,
 }
 
 static int rocker_port_obj_add(struct net_device *dev,
-  struct switchdev_obj *obj,
+  enum switchdev_obj_id id, const void *obj,
   struct switchdev_trans *trans)
 {
struct rocker_port *rocker_port = netdev_priv(dev);
const struct switchdev_obj_ipv4_fib *fib4;
int err = 0;
 
-   switch (obj->id) {
+   switch (id) {
case SWITCHDEV_OBJ_PORT_VLAN:
-   err = rocker_port_vlans_add(rocker_port, trans,
-   >u.vlan);
+   err = rocker_port_vlans_add(rocker_port, trans, obj);
break;
case SWITCHDEV_OBJ_IPV4_FIB:
-   fib4 = >u.ipv4_fib;
+   fib4 = obj;
err = rocker_port_fib_ipv4(rocker_port, trans,
   htonl(fib4->dst), fib4->dst_len,
   fib4->fi, fib4->tb_id, 0);
break;
case SWITCHDEV_OBJ_PORT_FDB:
-   err = rocker_port_fdb_add(rocker_port, trans, >u.fdb);
+   err = rocker_port_fdb_add(rocker_port, trans, obj);
break;
default:
err = -EOPNOTSUPP;
@@ -4509,25 +4508,25 @@ static int rocker_port_fdb_del(struct rocker_port 
*rocker_port,
 }
 
 static int rocker_port_obj_del(struct net_device *dev,
-  struct switchdev_obj *obj)
+  enum switchdev_obj_id id, const void *obj)
 {
struct rocker_port *rocker_port = netdev_priv(dev);
const struct switchdev_obj_ipv4_fib *fib4;
int err = 0;
 
-   switch (obj->id) {
+   switch (id) {
case SWITCHDEV_OBJ_PORT_VLAN:
-   err = rocker_port_vlans_del(rocker_port, >u.vlan);
+   err = rocker_port_vlans_del(rocker_port, obj);
break;
case SWITCHDEV_OBJ_IPV4_FIB:
-   fib4 = >u.ipv4_fib;
+   fib4 = obj;
err = rocker_port_fib_ipv4(rocker_port, NULL,
   htonl(fib4->dst), fib4->dst_len,
   fib4->fi, fib4->tb_id,
   ROCKER_OP_FLAG_REMOVE);
break;
case SWITCHDEV_OBJ_PORT_FDB:
-   err = rocker_port_fdb_del(rocker_port, NULL, >u.fdb);
+   err = rocker_port_fdb_del(rocker_port, NULL, obj);
break;
default:
err = -EOPNOTSUPP;
diff --git a/include/net/switchdev.h b/include/net/switchdev.h
index a2f57fb..bcadac3 100644
--- a/include/net/switchdev.h
+++ b/include/net/switchdev.h
@@ -115,10 +115,12 @@ struct switchdev_ops {
   struct switchdev_attr *attr,
   struct switchdev_trans *trans);
int (*switchdev_port_obj_add)(struct net_device *dev,
- struct switchdev_obj *obj,
+ enum switchdev_obj_id id,
+ const void *obj,
  struct switchdev_trans *trans);
int (*switchdev_port_obj_del)(struct net_device *dev,
- struct switchdev_obj *obj);
+ enum switchdev_obj_id id,
+ const void *obj);
int (*switchdev_port_obj_dump)(struct net_device *dev,
   enum switchdev_obj_id id, void *obj,
   int (*cb)(void *obj));
@@ -151,8 +153,10 @@ int 

Re: [RFT] geneve: implement support for IPv6-based tunnels

2015-09-29 Thread Jiri Benc
On Mon, 28 Sep 2015 15:20:33 -0400, John W. Linville wrote:
> > To be really useful, geneve should open both IPv4 and IPv6 socket when
> > it's metadata based. Take a look at my recent patchset that does this
> > for vxlan: http://thread.gmane.org/gmane.linux.network/379282
> 
> OK, that seems simple enough.  So we should just assume that a metadata
> tunnel could do either protocol at any time?  Or are there more rules
> than that?

That should be it, on egress. On ingress, udp_tun_rx_dst needs to be
called with the appropriate family which seems to be missing from your
patch, too (there's AF_INET unconditionally, currently).

 Jiri

-- 
Jiri Benc
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] net: dsa: fix preparation of a port STP update

2015-09-29 Thread Andrew Lunn
On Tue, Sep 29, 2015 at 12:38:36PM -0400, Vivien Didelot wrote:
> Because of the default 0 value of ret in dsa_slave_port_attr_set, a
> driver may return -EOPNOTSUPP from the commit phase of a STP state,
> which triggers a WARN() from switchdev.
> 
> This happened on a 6185 switch which does not support hardware bridging.
> 
> Reported-by: Andrew Lunn 
> Signed-off-by: Vivien Didelot 

Acked-by: Andrew Lunn 
Fixes: 3563606258cf ("switchdev: convert STP update to switchdev attr set")

David:
  This should be included in the next -rc.

Thanks
Andrew

> ---
>  net/dsa/slave.c | 11 ---
>  1 file changed, 8 insertions(+), 3 deletions(-)
> 
> diff --git a/net/dsa/slave.c b/net/dsa/slave.c
> index 0ae427c..02a3af8 100644
> --- a/net/dsa/slave.c
> +++ b/net/dsa/slave.c
> @@ -453,12 +453,17 @@ static int dsa_slave_port_attr_set(struct net_device 
> *dev,
>  struct switchdev_attr *attr,
>  struct switchdev_trans *trans)
>  {
> - int ret = 0;
> + struct dsa_slave_priv *p = netdev_priv(dev);
> + struct dsa_switch *ds = p->parent;
> + int ret;
>  
>   switch (attr->id) {
>   case SWITCHDEV_ATTR_PORT_STP_STATE:
> - if (switchdev_trans_ph_commit(trans))
> - ret = dsa_slave_stp_update(dev, attr->u.stp_state);
> + if (switchdev_trans_ph_prepare(trans))
> + ret = ds->drv->port_stp_update ? 0 : -EOPNOTSUPP;
> + else
> + ret = ds->drv->port_stp_update(ds, p->port,
> +attr->u.stp_state);
>   break;
>   default:
>   ret = -EOPNOTSUPP;
> -- 
> 2.6.0
> 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [MM PATCH V4 5/6] slub: support for bulk free with SLUB freelists

2015-09-29 Thread Alexander Duyck

On 09/29/2015 08:48 AM, Jesper Dangaard Brouer wrote:

Make it possible to free a freelist with several objects by adjusting
API of slab_free() and __slab_free() to have head, tail and an objects
counter (cnt).

Tail being NULL indicate single object free of head object.  This
allow compiler inline constant propagation in slab_free() and
slab_free_freelist_hook() to avoid adding any overhead in case of
single object free.

This allows a freelist with several objects (all within the same
slab-page) to be free'ed using a single locked cmpxchg_double in
__slab_free() and with an unlocked cmpxchg_double in slab_free().

Object debugging on the free path is also extended to handle these
freelists.  When CONFIG_SLUB_DEBUG is enabled it will also detect if
objects don't belong to the same slab-page.

These changes are needed for the next patch to bulk free the detached
freelists it introduces and constructs.

Micro benchmarking showed no performance reduction due to this change,
when debugging is turned off (compiled with CONFIG_SLUB_DEBUG).

Signed-off-by: Jesper Dangaard Brouer 
Signed-off-by: Alexander Duyck 

---
V4:
  - Change API per req of Christoph Lameter
  - Remove comments in init_object.

  mm/slub.c |   87 -
  1 file changed, 69 insertions(+), 18 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 1cf98d89546d..7c2abc33fd4e 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1063,11 +1063,15 @@ bad:
return 0;
  }

+/* Supports checking bulk free of a constructed freelist */
  static noinline struct kmem_cache_node *free_debug_processing(
-   struct kmem_cache *s, struct page *page, void *object,
+   struct kmem_cache *s, struct page *page,
+   void *head, void *tail, int bulk_cnt,
unsigned long addr, unsigned long *flags)
  {
struct kmem_cache_node *n = get_node(s, page_to_nid(page));
+   void *object = head;
+   int cnt = 0;

spin_lock_irqsave(>list_lock, *flags);
slab_lock(page);
@@ -1075,6 +1079,9 @@ static noinline struct kmem_cache_node 
*free_debug_processing(
if (!check_slab(s, page))
goto fail;

+next_object:
+   cnt++;
+
if (!check_valid_pointer(s, page, object)) {
slab_err(s, page, "Invalid object pointer 0x%p", object);
goto fail;
@@ -1105,8 +1112,19 @@ static noinline struct kmem_cache_node 
*free_debug_processing(
if (s->flags & SLAB_STORE_USER)
set_track(s, object, TRACK_FREE, addr);
trace(s, page, object, 0);
+   /* Freepointer not overwritten by init_object(), SLAB_POISON moved it */
init_object(s, object, SLUB_RED_INACTIVE);
+
+   /* Reached end of constructed freelist yet? */
+   if (object != tail) {
+   object = get_freepointer(s, object);
+   goto next_object;
+   }
  out:
+   if (cnt != bulk_cnt)
+   slab_err(s, page, "Bulk freelist count(%d) invalid(%d)\n",
+bulk_cnt, cnt);
+
slab_unlock(page);
/*
 * Keep node_lock to preserve integrity
@@ -1210,7 +1228,8 @@ static inline int alloc_debug_processing(struct 
kmem_cache *s,
struct page *page, void *object, unsigned long addr) { return 0; }

  static inline struct kmem_cache_node *free_debug_processing(
-   struct kmem_cache *s, struct page *page, void *object,
+   struct kmem_cache *s, struct page *page,
+   void *head, void *tail, int bulk_cnt,
unsigned long addr, unsigned long *flags) { return NULL; }

  static inline int slab_pad_check(struct kmem_cache *s, struct page *page)
@@ -1306,6 +1325,31 @@ static inline void slab_free_hook(struct kmem_cache *s, 
void *x)
kasan_slab_free(s, x);
  }

+/* Compiler cannot detect that slab_free_freelist_hook() can be
+ * removed if slab_free_hook() evaluates to nothing.  Thus, we need to
+ * catch all relevant config debug options here.
+ */


Is it actually generating nothing but a pointer walking loop or is there 
a bit of code cruft that is being evaluated inside the loop?



+#if defined(CONFIG_KMEMCHECK) ||   \
+   defined(CONFIG_LOCKDEP) ||  \
+   defined(CONFIG_DEBUG_KMEMLEAK) ||   \
+   defined(CONFIG_DEBUG_OBJECTS_FREE) ||   \
+   defined(CONFIG_KASAN)
+static inline void slab_free_freelist_hook(struct kmem_cache *s,
+  void *head, void *tail)
+{
+   void *object = head;
+   void *tail_obj = tail ? : head;
+
+   do {
+   slab_free_hook(s, object);
+   } while ((object != tail_obj) &&
+(object = get_freepointer(s, object)));
+}
+#else
+static inline void slab_free_freelist_hook(struct kmem_cache *s, void 
*obj_tail,
+  void *freelist_head) {}
+#endif
+


Instead of messing around with an #else you might just wrap 

[MM PATCH V4 3/6] slub: mark the dangling ifdef #else of CONFIG_SLUB_DEBUG

2015-09-29 Thread Jesper Dangaard Brouer
The #ifdef of CONFIG_SLUB_DEBUG is located very far from
the associated #else.  For readability mark it with a comment.

Signed-off-by: Jesper Dangaard Brouer 
Acked-by: Christoph Lameter 
---
 mm/slub.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/slub.c b/mm/slub.c
index 024eed32da2c..1cf98d89546d 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1202,7 +1202,7 @@ unsigned long kmem_cache_flags(unsigned long object_size,
 
return flags;
 }
-#else
+#else /* !CONFIG_SLUB_DEBUG */
 static inline void setup_object_debug(struct kmem_cache *s,
struct page *page, void *object) {}
 

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[MM PATCH V4 0/6] Further optimizing SLAB/SLUB bulking

2015-09-29 Thread Jesper Dangaard Brouer
Most important part of this patchset is the introducing of what I call
detached freelist, for improving SLUB performance of object freeing in
the "slowpath" of kmem_cache_free_bulk.

Tagging patchset with "V4" to avoid confusion with "V2":
 (V2) http://thread.gmane.org/gmane.linux.kernel.mm/137469

Addressing comments from:
 ("V3") http://thread.gmane.org/gmane.linux.kernel.mm/139268

I've added Christoph Lameter's ACKs from prev review.
 * Only patch 5 is changed significantly and needs review.
 * Benchmarked, performance is the same

Notes for patches:
 * First two patches (from Christoph) are already in AKPM MMOTS.
 * Patch 3 is trivial
 * Patch 4 is a repost, implements bulking for SLAB.
  - http://thread.gmane.org/gmane.linux.kernel.mm/138220
 * Patch 5 and 6 are the important patches
  - Patch 5 handle "freelists" in slab_free() and __slab_free().
  - Patch 6 intro detached freelists, and significant performance improvement

Patches should be ready for the MM-tree, as I'm now handling kmem
debug support.


Based on top of commit 519f526d39 in net-next, but I've tested it
applies on top of mmotm-2015-09-18-16-08.

The benchmarking tools are avail here:
 https://github.com/netoptimizer/prototype-kernel/tree/master/kernel/mm
 See: slab_bulk_test0{1,2,3}.c

This was joint work with Alexander Duyck while still at Red Hat.

This patchset is part of my network stack use-case.  I'll post the
network side of the patchset as soon as I've cleaned it up, rebased it
on net-next and re-run all the benchmarks.

---

Christoph Lameter (2):
  slub: create new ___slab_alloc function that can be called with irqs 
disabled
  slub: Avoid irqoff/on in bulk allocation

Jesper Dangaard Brouer (4):
  slub: mark the dangling ifdef #else of CONFIG_SLUB_DEBUG
  slab: implement bulking for SLAB allocator
  slub: support for bulk free with SLUB freelists
  slub: optimize bulk slowpath free by detached freelist


 mm/slab.c |   87 ++--
 mm/slub.c |  263 +++--
 2 files changed, 249 insertions(+), 101 deletions(-)

--
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 net-next 2/6] net: switchdev: move dev in switchdev_fdb_dump

2015-09-29 Thread Vivien Didelot
The FDB dump callback requires the related net_device so move it to the
struct switchdev_fdb_dump superset instead of using a callback param.

With this done, it'll be simpler to change the dump function signature.

Signed-off-by: Vivien Didelot 
---
 net/switchdev/switchdev.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
index 56d34ed..c0e2047 100644
--- a/net/switchdev/switchdev.c
+++ b/net/switchdev/switchdev.c
@@ -858,6 +858,7 @@ EXPORT_SYMBOL_GPL(switchdev_port_fdb_del);
 
 struct switchdev_fdb_dump {
struct switchdev_obj obj;
+   struct net_device *dev;
struct sk_buff *skb;
struct netlink_callback *cb;
int idx;
@@ -887,7 +888,7 @@ static int switchdev_port_fdb_dump_cb(struct net_device 
*dev,
ndm->ndm_pad2= 0;
ndm->ndm_flags   = NTF_SELF;
ndm->ndm_type= 0;
-   ndm->ndm_ifindex = dev->ifindex;
+   ndm->ndm_ifindex = dump->dev->ifindex;
ndm->ndm_state   = obj->u.fdb.ndm_state;
 
if (nla_put(dump->skb, NDA_LLADDR, ETH_ALEN, obj->u.fdb.addr))
@@ -927,6 +928,7 @@ int switchdev_port_fdb_dump(struct sk_buff *skb, struct 
netlink_callback *cb,
.id = SWITCHDEV_OBJ_PORT_FDB,
.cb = switchdev_port_fdb_dump_cb,
},
+   .dev = dev,
.skb = skb,
.cb = cb,
.idx = idx,
-- 
2.6.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next] net: dsa: fix preparation of a port STP update

2015-09-29 Thread Vivien Didelot
Because of the default 0 value of ret in dsa_slave_port_attr_set, a
driver may return -EOPNOTSUPP from the commit phase of a STP state,
which triggers a WARN() from switchdev.

This happened on a 6185 switch which does not support hardware bridging.

Reported-by: Andrew Lunn 
Signed-off-by: Vivien Didelot 
---
 net/dsa/slave.c | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index 0ae427c..02a3af8 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -453,12 +453,17 @@ static int dsa_slave_port_attr_set(struct net_device *dev,
   struct switchdev_attr *attr,
   struct switchdev_trans *trans)
 {
-   int ret = 0;
+   struct dsa_slave_priv *p = netdev_priv(dev);
+   struct dsa_switch *ds = p->parent;
+   int ret;
 
switch (attr->id) {
case SWITCHDEV_ATTR_PORT_STP_STATE:
-   if (switchdev_trans_ph_commit(trans))
-   ret = dsa_slave_stp_update(dev, attr->u.stp_state);
+   if (switchdev_trans_ph_prepare(trans))
+   ret = ds->drv->port_stp_update ? 0 : -EOPNOTSUPP;
+   else
+   ret = ds->drv->port_stp_update(ds, p->port,
+  attr->u.stp_state);
break;
default:
ret = -EOPNOTSUPP;
-- 
2.6.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 net-next 1/6] net: switchdev: remove dev in port_vlan_dump_put

2015-09-29 Thread Vivien Didelot
The static switchdev_port_vlan_dump_put function does not need the
net_device parameter, so remove it.

Signed-off-by: Vivien Didelot 
---
 net/switchdev/switchdev.c | 11 +--
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
index 00ee547..56d34ed 100644
--- a/net/switchdev/switchdev.c
+++ b/net/switchdev/switchdev.c
@@ -484,8 +484,7 @@ struct switchdev_vlan_dump {
u16 end;
 };
 
-static int switchdev_port_vlan_dump_put(struct net_device *dev,
-   struct switchdev_vlan_dump *dump)
+static int switchdev_port_vlan_dump_put(struct switchdev_vlan_dump *dump)
 {
struct bridge_vlan_info vinfo;
 
@@ -531,7 +530,7 @@ static int switchdev_port_vlan_dump_cb(struct net_device 
*dev,
for (dump->begin = dump->end = vlan->vid_begin;
 dump->begin <= vlan->vid_end;
 dump->begin++, dump->end++) {
-   err = switchdev_port_vlan_dump_put(dev, dump);
+   err = switchdev_port_vlan_dump_put(dump);
if (err)
return err;
}
@@ -543,7 +542,7 @@ static int switchdev_port_vlan_dump_cb(struct net_device 
*dev,
/* prepend */
dump->begin = vlan->vid_begin;
} else {
-   err = switchdev_port_vlan_dump_put(dev, dump);
+   err = switchdev_port_vlan_dump_put(dump);
dump->flags = vlan->flags;
dump->begin = vlan->vid_begin;
dump->end = vlan->vid_end;
@@ -555,7 +554,7 @@ static int switchdev_port_vlan_dump_cb(struct net_device 
*dev,
/* append */
dump->end = vlan->vid_end;
} else {
-   err = switchdev_port_vlan_dump_put(dev, dump);
+   err = switchdev_port_vlan_dump_put(dump);
dump->flags = vlan->flags;
dump->begin = vlan->vid_begin;
dump->end = vlan->vid_end;
@@ -588,7 +587,7 @@ static int switchdev_port_vlan_fill(struct sk_buff *skb, 
struct net_device *dev,
goto err_out;
if (filter_mask & RTEXT_FILTER_BRVLAN_COMPRESSED)
/* last one */
-   err = switchdev_port_vlan_dump_put(dev, );
+   err = switchdev_port_vlan_dump_put();
}
 
 err_out:
-- 
2.6.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 net-next 4/6] net: switchdev: pass callback to dump operation

2015-09-29 Thread Vivien Didelot
Similar to the notifier_call callback of a notifier_block, change the
function signature of switchdev dump operation to:

int switchdev_port_obj_dump(struct net_device *dev,
enum switchdev_obj_id id, void *obj,
int (*cb)(void *obj));

This allows the caller to pass and expect back a specific
switchdev_obj_* structure instead of the generic switchdev_obj one.

Drivers implementation of dump operation can now expect this specific
structure and call the callback with it. Drivers have been changed
accordingly.

Signed-off-by: Vivien Didelot 
---
 drivers/net/ethernet/rocker/rocker.c | 21 +
 include/net/switchdev.h  |  9 +---
 net/dsa/slave.c  | 26 +++--
 net/switchdev/switchdev.c| 45 ++--
 4 files changed, 53 insertions(+), 48 deletions(-)

diff --git a/drivers/net/ethernet/rocker/rocker.c 
b/drivers/net/ethernet/rocker/rocker.c
index 78fd443..107adb6 100644
--- a/drivers/net/ethernet/rocker/rocker.c
+++ b/drivers/net/ethernet/rocker/rocker.c
@@ -4538,10 +4538,10 @@ static int rocker_port_obj_del(struct net_device *dev,
 }
 
 static int rocker_port_fdb_dump(const struct rocker_port *rocker_port,
-   struct switchdev_obj *obj)
+   struct switchdev_obj_fdb *fdb,
+   int (*cb)(void *obj))
 {
struct rocker *rocker = rocker_port->rocker;
-   struct switchdev_obj_fdb *fdb = >u.fdb;
struct rocker_fdb_tbl_entry *found;
struct hlist_node *tmp;
unsigned long lock_flags;
@@ -4556,7 +4556,7 @@ static int rocker_port_fdb_dump(const struct rocker_port 
*rocker_port,
fdb->ndm_state = NUD_REACHABLE;
fdb->vid = rocker_port_vlan_to_vid(rocker_port,
   found->key.vlan_id);
-   err = obj->cb(obj);
+   err = cb(fdb);
if (err)
break;
}
@@ -4566,9 +4566,9 @@ static int rocker_port_fdb_dump(const struct rocker_port 
*rocker_port,
 }
 
 static int rocker_port_vlan_dump(const struct rocker_port *rocker_port,
-struct switchdev_obj *obj)
+struct switchdev_obj_vlan *vlan,
+   int (*cb)(void *obj))
 {
-   struct switchdev_obj_vlan *vlan = >u.vlan;
u16 vid;
int err = 0;
 
@@ -4579,7 +4579,7 @@ static int rocker_port_vlan_dump(const struct rocker_port 
*rocker_port,
if (rocker_vlan_id_is_internal(htons(vid)))
vlan->flags |= BRIDGE_VLAN_INFO_PVID;
vlan->vid_begin = vlan->vid_end = vid;
-   err = obj->cb(obj);
+   err = cb(vlan);
if (err)
break;
}
@@ -4588,17 +4588,18 @@ static int rocker_port_vlan_dump(const struct 
rocker_port *rocker_port,
 }
 
 static int rocker_port_obj_dump(struct net_device *dev,
-   struct switchdev_obj *obj)
+   enum switchdev_obj_id id, void *obj,
+   int (*cb)(void *obj))
 {
const struct rocker_port *rocker_port = netdev_priv(dev);
int err = 0;
 
-   switch (obj->id) {
+   switch (id) {
case SWITCHDEV_OBJ_PORT_FDB:
-   err = rocker_port_fdb_dump(rocker_port, obj);
+   err = rocker_port_fdb_dump(rocker_port, obj, cb);
break;
case SWITCHDEV_OBJ_PORT_VLAN:
-   err = rocker_port_vlan_dump(rocker_port, obj);
+   err = rocker_port_vlan_dump(rocker_port, obj, cb);
break;
default:
err = -EOPNOTSUPP;
diff --git a/include/net/switchdev.h b/include/net/switchdev.h
index 9ef7c56..a2f57fb 100644
--- a/include/net/switchdev.h
+++ b/include/net/switchdev.h
@@ -120,7 +120,8 @@ struct switchdev_ops {
int (*switchdev_port_obj_del)(struct net_device *dev,
  struct switchdev_obj *obj);
int (*switchdev_port_obj_dump)(struct net_device *dev,
- struct switchdev_obj *obj);
+  enum switchdev_obj_id id, void *obj,
+  int (*cb)(void *obj));
 };
 
 enum switchdev_notifier_type {
@@ -152,7 +153,8 @@ int switchdev_port_attr_set(struct net_device *dev,
struct switchdev_attr *attr);
 int switchdev_port_obj_add(struct net_device *dev, struct switchdev_obj *obj);
 int switchdev_port_obj_del(struct net_device *dev, struct switchdev_obj *obj);
-int switchdev_port_obj_dump(struct net_device *dev, struct switchdev_obj *obj);
+int switchdev_port_obj_dump(struct net_device *dev, enum switchdev_obj_id id,
+   

Re: unregister_netdevice warnings when deleting netns

2015-09-29 Thread Anand Gurram
Hi Julian and Eric

I tried both the patches which you have suggested, the issue is still
seen, I am observing same warning message thrown on the console
  "unregister_netdevice: waiting for lo to become free. Usage count = 1".


>Sometimes people have addressed this class of issue with code review,
>but with a slow cleanup you can't catch this by finding a missing
>dev_put.

Yeah, currently since slow cleanup is happening I am unable to trace
just by having count for dev_hold and dev_put.


Actually at the time of tearing down the name space there is an active
TCP connection present.
When this TCP connection is not present then we are not seeing this issue.

Any additional ideas and suggestions on debugging in above scenario?

Best Regards,
Anand

On Tue, Sep 29, 2015 at 12:14 PM, Eric W. Biederman
 wrote:
> Anand Gurram  writes:
>
>>>If the message just spits out a few times and then goes away it simply
>>>means that something is taking a while to cleanup and drop it's
>>>reference.
>>
>> The message just spits out few times and then goes away, I am trying
>> to debug why cleanup is taking long,
>> and where it is still referenced. Any pointers in debugging such
>> issues will be of great help.
>
> The one thing I have done in the past is to instrument dev_hold
> and dev_put and look where in the code the stragglers are coming from
> (when I can reproduce the issue reliably).
>
> Sometimes people have addressed this class of issue with code review,
> but with a slow cleanup you can't catch this by finding a missing
> dev_put.
>
> It takes some creativity to find these as people rarely make the same
> mistake twice.
>
> Eric
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 4/6] net: switchdev: pass callback to dump operation

2015-09-29 Thread kbuild test robot
Hi Vivien,

[auto build test results on next-20150929 -- if it's inappropriate base, please 
ignore]

config: i386-randconfig-s1-201539 (attached as .config)
reproduce:
  git checkout b215cce51157820c4fb92ecfdc72f281a4286676
  # save the attached .config to linux build tree
  make ARCH=i386 

All error/warnings (new ones prefixed by >>):

   In file included from net/core/rtnetlink.c:47:0:
>> include/net/switchdev.h:216:1: error: expected identifier or '(' before '{' 
>> token
{
^
>> include/net/switchdev.h:213:19: warning: 'switchdev_port_obj_dump' declared 
>> 'static' but never defined [-Wunused-function]
static inline int switchdev_port_obj_dump(struct net_device *dev,
  ^

vim +216 include/net/switchdev.h

491d0f15 Scott Feldman  2015-05-10  207  static inline int 
switchdev_port_obj_del(struct net_device *dev,
491d0f15 Scott Feldman  2015-05-10  208 
 struct switchdev_obj *obj)
491d0f15 Scott Feldman  2015-05-10  209  {
491d0f15 Scott Feldman  2015-05-10  210 return -EOPNOTSUPP;
491d0f15 Scott Feldman  2015-05-10  211  }
491d0f15 Scott Feldman  2015-05-10  212  
45d4122c Samudrala, Sridhar 2015-05-13 @213  static inline int 
switchdev_port_obj_dump(struct net_device *dev,
b215cce5 Vivien Didelot 2015-09-29  214 
  enum switchdev_obj_id id, void *obj,
b215cce5 Vivien Didelot 2015-09-29  215 
  int (*cb)(void *obj));
45d4122c Samudrala, Sridhar 2015-05-13 @216  {
45d4122c Samudrala, Sridhar 2015-05-13  217 return -EOPNOTSUPP;
45d4122c Samudrala, Sridhar 2015-05-13  218  }
45d4122c Samudrala, Sridhar 2015-05-13  219  

:: The code at line 216 was first introduced by commit
:: 45d4122ca7cdb3a4b91f392605cd22cfa75f1d99 switchdev: add support for fdb 
add/del/dump via switchdev_port_obj ops.

:: TO: Samudrala, Sridhar <sridhar.samudr...@intel.com>
:: CC: David S. Miller <da...@davemloft.net>

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data


[PATCH] testptp: Silence compiler warnings on ppc64

2015-09-29 Thread Thomas Huth
When compiling Documentation/ptp/testptp.c the following compiler
warnings are printed out:

Documentation/ptp/testptp.c: In function ‘main’:
Documentation/ptp/testptp.c:367:11: warning: format ‘%lld’ expects argument
of type ‘long long int’, but argument 3 has type ‘__s64’ [-Wformat=]
   event.t.sec, event.t.nsec);
   ^
Documentation/ptp/testptp.c:505:5: warning: format ‘%lld’ expects argument
of type ‘long long int’, but argument 2 has type ‘__s64’ [-Wformat=]
 (pct+2*i)->sec, (pct+2*i)->nsec);
 ^
Documentation/ptp/testptp.c:507:5: warning: format ‘%lld’ expects argument
of type ‘long long int’, but argument 2 has type ‘__s64’ [-Wformat=]
 (pct+2*i+1)->sec, (pct+2*i+1)->nsec);
 ^
Documentation/ptp/testptp.c:509:5: warning: format ‘%lld’ expects argument
of type ‘long long int’, but argument 2 has type ‘__s64’ [-Wformat=]
 (pct+2*i+2)->sec, (pct+2*i+2)->nsec);

This happens because __s64 is by default defined as "long" on ppc64,
not as "long long". However, to fix these warnings, it's possible to
define the __SANE_USERSPACE_TYPES__ so that __s64 gets defined to
"long long" on ppc64, too.

Signed-off-by: Thomas Huth 
---
 Documentation/ptp/testptp.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/Documentation/ptp/testptp.c b/Documentation/ptp/testptp.c
index 2bc8abc..6c6247a 100644
--- a/Documentation/ptp/testptp.c
+++ b/Documentation/ptp/testptp.c
@@ -18,6 +18,7 @@
  *  Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
  */
 #define _GNU_SOURCE
+#define __SANE_USERSPACE_TYPES__/* For PPC64, to get LL64 types */
 #include 
 #include 
 #include 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[MM PATCH V4 6/6] slub: optimize bulk slowpath free by detached freelist

2015-09-29 Thread Jesper Dangaard Brouer
This change focus on improving the speed of object freeing in the
"slowpath" of kmem_cache_free_bulk.

The calls slab_free (fastpath) and __slab_free (slowpath) have been
extended with support for bulk free, which amortize the overhead of
the (locked) cmpxchg_double.

To use the new bulking feature, we build what I call a detached
freelist.  The detached freelist takes advantage of three properties:

 1) the free function call owns the object that is about to be freed,
thus writing into this memory is synchronization-free.

 2) many freelist's can co-exist side-by-side in the same slab-page
each with a separate head pointer.

 3) it is the visibility of the head pointer that needs synchronization.

Given these properties, the brilliant part is that the detached
freelist can be constructed without any need for synchronization.  The
freelist is constructed directly in the page objects, without any
synchronization needed.  The detached freelist is allocated on the
stack of the function call kmem_cache_free_bulk.  Thus, the freelist
head pointer is not visible to other CPUs.

All objects in a SLUB freelist must belong to the same slab-page.
Thus, constructing the detached freelist is about matching objects
that belong to the same slab-page.  The bulk free array is scanned is
a progressive manor with a limited look-ahead facility.

Kmem debug support is handled in call of slab_free().

Notice kmem_cache_free_bulk no longer need to disable IRQs. This
only slowed down single free bulk with approx 3 cycles.


Performance data:
 Benchmarked[1] obj size 256 bytes on CPU i7-4790K @ 4.00GHz

SLUB fastpath single object quick reuse: 47 cycles(tsc) 11.931 ns

To get stable and comparable numbers, the kernel have been booted with
"slab_merge" (this also improve performance for larger bulk sizes).

Performance data, compared against fallback bulking:

bulk -  fallback bulk- improvement with this patch
   1 -  62 cycles(tsc) 15.662 ns - 49 cycles(tsc) 12.407 ns- improved 21.0%
   2 -  55 cycles(tsc) 13.935 ns - 30 cycles(tsc) 7.506 ns - improved 45.5%
   3 -  53 cycles(tsc) 13.341 ns - 23 cycles(tsc) 5.865 ns - improved 56.6%
   4 -  52 cycles(tsc) 13.081 ns - 20 cycles(tsc) 5.048 ns - improved 61.5%
   8 -  50 cycles(tsc) 12.627 ns - 18 cycles(tsc) 4.659 ns - improved 64.0%
  16 -  49 cycles(tsc) 12.412 ns - 17 cycles(tsc) 4.495 ns - improved 65.3%
  30 -  49 cycles(tsc) 12.484 ns - 18 cycles(tsc) 4.533 ns - improved 63.3%
  32 -  50 cycles(tsc) 12.627 ns - 18 cycles(tsc) 4.707 ns - improved 64.0%
  34 -  96 cycles(tsc) 24.243 ns - 23 cycles(tsc) 5.976 ns - improved 76.0%
  48 -  83 cycles(tsc) 20.818 ns - 21 cycles(tsc) 5.329 ns - improved 74.7%
  64 -  74 cycles(tsc) 18.700 ns - 20 cycles(tsc) 5.127 ns - improved 73.0%
 128 -  90 cycles(tsc) 22.734 ns - 27 cycles(tsc) 6.833 ns - improved 70.0%
 158 -  99 cycles(tsc) 24.776 ns - 30 cycles(tsc) 7.583 ns - improved 69.7%
 250 - 104 cycles(tsc) 26.089 ns - 37 cycles(tsc) 9.280 ns - improved 64.4%

Performance data, compared current in-kernel bulking:

bulk - curr in-kernel  - improvement with this patch
   1 -  46 cycles(tsc) - 49 cycles(tsc) - improved (cycles:-3) -6.5%
   2 -  27 cycles(tsc) - 30 cycles(tsc) - improved (cycles:-3) -11.1%
   3 -  21 cycles(tsc) - 23 cycles(tsc) - improved (cycles:-2) -9.5%
   4 -  18 cycles(tsc) - 20 cycles(tsc) - improved (cycles:-2) -11.1%
   8 -  17 cycles(tsc) - 18 cycles(tsc) - improved (cycles:-1) -5.9%
  16 -  18 cycles(tsc) - 17 cycles(tsc) - improved (cycles: 1)  5.6%
  30 -  18 cycles(tsc) - 18 cycles(tsc) - improved (cycles: 0)  0.0%
  32 -  18 cycles(tsc) - 18 cycles(tsc) - improved (cycles: 0)  0.0%
  34 -  78 cycles(tsc) - 23 cycles(tsc) - improved (cycles:55) 70.5%
  48 -  60 cycles(tsc) - 21 cycles(tsc) - improved (cycles:39) 65.0%
  64 -  49 cycles(tsc) - 20 cycles(tsc) - improved (cycles:29) 59.2%
 128 -  69 cycles(tsc) - 27 cycles(tsc) - improved (cycles:42) 60.9%
 158 -  79 cycles(tsc) - 30 cycles(tsc) - improved (cycles:49) 62.0%
 250 -  86 cycles(tsc) - 37 cycles(tsc) - improved (cycles:49) 57.0%

Performance with normal SLUB merging is significantly slower for
larger bulking.  This is believed to (primarily) be an effect of not
having to share the per-CPU data-structures, as tuning per-CPU size
can achieve similar performance.

bulk - slab_nomerge   -  normal SLUB merge
   1 -  49 cycles(tsc) - 49 cycles(tsc) - merge slower with cycles:0
   2 -  30 cycles(tsc) - 30 cycles(tsc) - merge slower with cycles:0
   3 -  23 cycles(tsc) - 23 cycles(tsc) - merge slower with cycles:0
   4 -  20 cycles(tsc) - 20 cycles(tsc) - merge slower with cycles:0
   8 -  18 cycles(tsc) - 18 cycles(tsc) - merge slower with cycles:0
  16 -  17 cycles(tsc) - 17 cycles(tsc) - merge slower with cycles:0
  30 -  18 cycles(tsc) - 23 cycles(tsc) - merge slower with cycles:5
  32 -  18 cycles(tsc) - 22 cycles(tsc) - merge slower with cycles:4
  34 -  23 cycles(tsc) - 22 cycles(tsc) - merge slower with cycles:-1
  48 -  21 

[PATCH v2 net-next 6/6] net: switchdev: extract struct switchdev_obj_*

2015-09-29 Thread Vivien Didelot
Now that switchdev and its drivers directly use specific switchdev_obj_*
structures, move them out of the switchdev_obj union and get rif of this
outer structure.

Signed-off-by: Vivien Didelot 
---
 include/net/switchdev.h | 53 -
 1 file changed, 26 insertions(+), 27 deletions(-)

diff --git a/include/net/switchdev.h b/include/net/switchdev.h
index bcadac3..e11425e 100644
--- a/include/net/switchdev.h
+++ b/include/net/switchdev.h
@@ -64,30 +64,29 @@ enum switchdev_obj_id {
SWITCHDEV_OBJ_PORT_FDB,
 };
 
-struct switchdev_obj {
-   enum switchdev_obj_id id;
-   int (*cb)(struct switchdev_obj *obj);
-   union {
-   struct switchdev_obj_vlan { /* PORT_VLAN */
-   u16 flags;
-   u16 vid_begin;
-   u16 vid_end;
-   } vlan;
-   struct switchdev_obj_ipv4_fib { /* IPV4_FIB */
-   u32 dst;
-   int dst_len;
-   struct fib_info *fi;
-   u8 tos;
-   u8 type;
-   u32 nlflags;
-   u32 tb_id;
-   } ipv4_fib;
-   struct switchdev_obj_fdb {  /* PORT_FDB */
-   const unsigned char *addr;
-   u16 vid;
-   u16 ndm_state;
-   } fdb;
-   } u;
+/* SWITCHDEV_OBJ_PORT_VLAN */
+struct switchdev_obj_vlan {
+   u16 flags;
+   u16 vid_begin;
+   u16 vid_end;
+};
+
+/* SWITCHDEV_OBJ_IPV4_FIB */
+struct switchdev_obj_ipv4_fib {
+   u32 dst;
+   int dst_len;
+   struct fib_info *fi;
+   u8 tos;
+   u8 type;
+   u32 nlflags;
+   u32 tb_id;
+};
+
+/* SWITCHDEV_OBJ_PORT_FDB */
+struct switchdev_obj_fdb {
+   const unsigned char *addr;
+   u16 vid;
+   u16 ndm_state;
 };
 
 void switchdev_trans_item_enqueue(struct switchdev_trans *trans,
@@ -102,11 +101,11 @@ void *switchdev_trans_item_dequeue(struct switchdev_trans 
*trans);
  *
  * @switchdev_port_attr_set: Set a port attribute (see switchdev_attr).
  *
- * @switchdev_port_obj_add: Add an object to port (see switchdev_obj).
+ * @switchdev_port_obj_add: Add an object to port (see switchdev_obj_*).
  *
- * @switchdev_port_obj_del: Delete an object from port (see switchdev_obj).
+ * @switchdev_port_obj_del: Delete an object from port (see switchdev_obj_*).
  *
- * @switchdev_port_obj_dump: Dump port objects (see switchdev_obj).
+ * @switchdev_port_obj_dump: Dump port objects (see switchdev_obj_*).
  */
 struct switchdev_ops {
int (*switchdev_port_attr_get)(struct net_device *dev,
-- 
2.6.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [MM PATCH V4 5/6] slub: support for bulk free with SLUB freelists

2015-09-29 Thread Jesper Dangaard Brouer
On Tue, 29 Sep 2015 09:38:30 -0700
Alexander Duyck  wrote:

> On 09/29/2015 08:48 AM, Jesper Dangaard Brouer wrote:
> > Make it possible to free a freelist with several objects by adjusting
> > API of slab_free() and __slab_free() to have head, tail and an objects
> > counter (cnt).
> >
> > Tail being NULL indicate single object free of head object.  This
> > allow compiler inline constant propagation in slab_free() and
> > slab_free_freelist_hook() to avoid adding any overhead in case of
> > single object free.
> >
> > This allows a freelist with several objects (all within the same
> > slab-page) to be free'ed using a single locked cmpxchg_double in
> > __slab_free() and with an unlocked cmpxchg_double in slab_free().
> >
> > Object debugging on the free path is also extended to handle these
> > freelists.  When CONFIG_SLUB_DEBUG is enabled it will also detect if
> > objects don't belong to the same slab-page.
> >
> > These changes are needed for the next patch to bulk free the detached
> > freelists it introduces and constructs.
> >
> > Micro benchmarking showed no performance reduction due to this change,
> > when debugging is turned off (compiled with CONFIG_SLUB_DEBUG).
> >
> > Signed-off-by: Jesper Dangaard Brouer 
> > Signed-off-by: Alexander Duyck 
> >
> > ---
> > V4:
> >   - Change API per req of Christoph Lameter
> >   - Remove comments in init_object.
> >
[...]
> >
> > +/* Compiler cannot detect that slab_free_freelist_hook() can be
> > + * removed if slab_free_hook() evaluates to nothing.  Thus, we need to
> > + * catch all relevant config debug options here.
> > + */
> 
> Is it actually generating nothing but a pointer walking loop or is there 
> a bit of code cruft that is being evaluated inside the loop?

If any of the defines are activated, then slab_free_hook(s, object)
will generate some code.

In the case of single object free, then the compiler see that it can
remove the loop, and also notice if slab_free_hook() eval to nothing.

The compiler is not smart enough to remove the loop for multiobject
case, even-though it can see that slab_free_hook() eval to nothing
(in that case it does a pointer walk without any code eval).  Thus, I
need this construct.

> > +#if defined(CONFIG_KMEMCHECK) ||   \
> > +   defined(CONFIG_LOCKDEP) ||  \
> > +   defined(CONFIG_DEBUG_KMEMLEAK) ||   \
> > +   defined(CONFIG_DEBUG_OBJECTS_FREE) ||   \
> > +   defined(CONFIG_KASAN)
> > +static inline void slab_free_freelist_hook(struct kmem_cache *s,
> > +  void *head, void *tail)
> > +{
> > +   void *object = head;
> > +   void *tail_obj = tail ? : head;
> > +
> > +   do {
> > +   slab_free_hook(s, object);
> > +   } while ((object != tail_obj) &&
> > +(object = get_freepointer(s, object)));
> > +}
> > +#else
> > +static inline void slab_free_freelist_hook(struct kmem_cache *s, void 
> > *obj_tail,
> > +  void *freelist_head) {}
> > +#endif
> > +
> 
> Instead of messing around with an #else you might just wrap the contents 
> of slab_free_freelist_hook in the #if/#endif instead of the entire 
> function declaration.

I had it that way in an earlier version of the patch, but I liked
better this way.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Fw: [Bug 105221] New: system panics under load on mlx4_en interfaces

2015-09-29 Thread Stephen Hemminger


Begin forwarded message:

Date: Tue, 29 Sep 2015 07:19:32 +
From: "bugzilla-dae...@bugzilla.kernel.org" 

To: "shemmin...@linux-foundation.org" 
Subject: [Bug 105221] New: system panics under load on mlx4_en interfaces


https://bugzilla.kernel.org/show_bug.cgi?id=105221

Bug ID: 105221
   Summary: system panics under load on mlx4_en interfaces
   Product: Networking
   Version: 2.5
Kernel Version: 4.3.0-rc3-vanilla
  Hardware: x86-64
OS: Linux
  Tree: Mainline
Status: NEW
  Severity: normal
  Priority: P1
 Component: Other
  Assignee: shemmin...@linux-foundation.org
  Reporter: tho...@drewermann.org
Regression: No

We are using HP ProLiant DL320e Gen8 with a dual port ConnectX-2 EN network
Mellanox NIC (P/N: MNPH29D_A2-A5) installed. BIOS, iLO, microcode and NIC
firwmwares are up to date. Already tried to change interrupts. All offloading
features are currently disabled:
Features for eth2:
rx-checksumming: on
tx-checksumming: on
tx-checksum-ipv4: on
tx-checksum-ip-generic: off [fixed]
tx-checksum-ipv6: on
tx-checksum-fcoe-crc: off [fixed]
tx-checksum-sctp: off [fixed]
scatter-gather: on
tx-scatter-gather: on
tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: on
tx-tcp-segmentation: on
tx-tcp-ecn-segmentation: off [fixed]
tx-tcp6-segmentation: on
udp-fragmentation-offload: off [fixed]
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: off [fixed]
receive-hashing: on
highdma: on [fixed]
rx-vlan-filter: on [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off [fixed]
tx-ipip-segmentation: off [fixed]
tx-sit-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off
rx-fcs: off [fixed]
rx-all: off [fixed]
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: on
rx-vlan-stag-filter: on [fixed]
l2-fwd-offload: off [fixed]
busy-poll: on [fixed]

When putting load on those NICs we are receiving a kpanic. The issue can be
reproduced at any time. Kernel version doesn't make any difference.

[  176.892495] [ cut here ]
[  176.892513] kernel BUG at net/core/skbuff.c:2097!
[  176.892525] invalid opcode:  [#1] SMP
[  176.892538] Modules linked in: cpufreq_stats cpufreq_userspace
cpufreq_powersave iptable_filter cpufreq_conservative xt_CT nf_conntrack
iptable_raw ip_tables x_tables nfsd auth_rpcgss oid_registry nfs_acl nfs lockd
grace fscache sunrpc ip_gre ip_tunnel gre intel_rapl iosf_mbi
x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul
crc32_pclmul ghash_clmulni_intel sha256_ssse3 sha256_generic hmac drbg
ansi_cprng aesni_intel mgag200 aes_x86_64 lrw ttm drm_kms_helper gf128mul
glue_helper drm ablk_helper iTCO_wdt cryptd iTCO_vendor_support joydev evdev
psmouse ie31200_edac serio_raw hpilo i2c_algo_bit edac_core lpc_ich hpwdt
snd_pcm snd_timer snd 8250_fintek soundcore pcspkr mfd_core ipmi_si
ipmi_msghandler shpchp button pcc_cpufreq acpi_cpufreq processor
acpi_power_meter 8021q
[  176.892778]  garp mrp stp llc dummy autofs4 ext4 crc16 mbcache jbd2 dm_mod
mlx4_en vxlan ip6_udp_tunnel udp_tunnel sg sd_mod uas usb_storage scsi_mod
hid_generic usbhid hid crc32c_intel mlx4_core ehci_pci uhci_hcd tg3 ehci_hcd
ptp pps_core libphy usbcore usb_common thermal
[  176.892868] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.3.0-rc3-vanillaice
#1
[  176.892885] Hardware name: HP ProLiant DL320e Gen8, BIOS J05 11/09/2013
[  176.892902] task: 81814540 ti: 8180 task.ti:
8180
[  176.892919] RIP: 0010:[]  []
__skb_checksum+0x2d6/0x2f0
[  176.892942] RSP: 0018:8802474038f8  EFLAGS: 00010286
[  176.892955] RAX: 12f3 RBX: 12f3 RCX:
0ec6
[  176.892972] RDX: 88022ce1d980 RSI: 12f3 RDI:
8800afed4400
[  176.892988] RBP:  R08: 880247403978 R09:
12f3
[  176.893005] R10: 88022ce1d300 R11: 0002 R12:

[  176.893021] R13:  R14: 12f3 R15:

[  176.893038] FS:  () GS:88024740()
knlGS:
[  176.893056] CS:  0010 DS:  ES:  CR0: 80050033
[  176.893070] CR2: 7f42a19c CR3: 0180d000 CR4:
001406f0
[  176.893086] Stack:
[  176.893092]  b0ddb200 880247403978 12f3
81814540
[  176.893113]  81814540 81814540 
8800
[  176.893134]  0246 8800afed4400 

[MM PATCH V4 4/6] slab: implement bulking for SLAB allocator

2015-09-29 Thread Jesper Dangaard Brouer
Implement a basic approach of bulking in the SLAB allocator. Simply
use local_irq_{disable,enable} and call single alloc/free in a loop.
This simple implementation approach is surprising fast.

Notice the normal SLAB fastpath is: 96 cycles (24.119 ns). Below table
show that single object bulking only takes 42 cycles.  This can be
explained by the bulk APIs requirement to be called from a known
interrupt context, that is with interrupts enabled.  This allow us to
avoid the expensive (37 cycles) local_irq_{save,restore}, and instead
use the much faster (7 cycles) local_irq_{disable,restore}.

Benchmarked[1] obj size 256 bytes on CPU i7-4790K @ 4.00GHz:

bulk - Current  - simple SLAB bulk implementation
  1 - 115 cycles(tsc) 28.812 ns - 42 cycles(tsc) 10.715 ns - improved 63.5%
  2 - 103 cycles(tsc) 25.956 ns - 27 cycles(tsc)  6.985 ns - improved 73.8%
  3 - 101 cycles(tsc) 25.336 ns - 22 cycles(tsc)  5.733 ns - improved 78.2%
  4 - 100 cycles(tsc) 25.147 ns - 21 cycles(tsc)  5.319 ns - improved 79.0%
  8 -  98 cycles(tsc) 24.616 ns - 18 cycles(tsc)  4.620 ns - improved 81.6%
 16 -  97 cycles(tsc) 24.408 ns - 17 cycles(tsc)  4.344 ns - improved 82.5%
 30 -  98 cycles(tsc) 24.641 ns - 16 cycles(tsc)  4.202 ns - improved 83.7%
 32 -  98 cycles(tsc) 24.607 ns - 16 cycles(tsc)  4.199 ns - improved 83.7%
 34 -  98 cycles(tsc) 24.605 ns - 18 cycles(tsc)  4.579 ns - improved 81.6%
 48 -  97 cycles(tsc) 24.463 ns - 17 cycles(tsc)  4.405 ns - improved 82.5%
 64 -  97 cycles(tsc) 24.370 ns - 17 cycles(tsc)  4.384 ns - improved 82.5%
128 -  99 cycles(tsc) 24.763 ns - 19 cycles(tsc)  4.755 ns - improved 80.8%
158 -  98 cycles(tsc) 24.708 ns - 18 cycles(tsc)  4.723 ns - improved 81.6%
250 - 101 cycles(tsc) 25.342 ns - 20 cycles(tsc)  5.035 ns - improved 80.2%

Also notice how well bulking maintains the performance when the bulk
size increases (which is a soar spot for the SLUB allocator).

Increasing the bulk size further:
 20 cycles(tsc)  5.214 ns (bulk: 512)
 30 cycles(tsc)  7.734 ns (bulk: 768)
 40 cycles(tsc) 10.244 ns (bulk:1024)
 72 cycles(tsc) 18.049 ns (bulk:2048)
 90 cycles(tsc) 22.585 ns (bulk:4096)

It is not recommended to perform large bulking with SLAB, as
local interrupts are disabled for the entire period.  If these
kind of use-cases evolve, this interface should be adjusted to
mitigate/reduce the interrupts off period.

[1] 
https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/mm/slab_bulk_test01.c

Signed-off-by: Jesper Dangaard Brouer 
Acked-by: Christoph Lameter 
---
 mm/slab.c |   87 +++--
 1 file changed, 62 insertions(+), 25 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index c77ebe6cc87c..21da6b1ccae3 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -3234,11 +3234,15 @@ __do_cache_alloc(struct kmem_cache *cachep, gfp_t flags)
 #endif /* CONFIG_NUMA */
 
 static __always_inline void *
-slab_alloc(struct kmem_cache *cachep, gfp_t flags, unsigned long caller)
+slab_alloc(struct kmem_cache *cachep, gfp_t flags, unsigned long caller,
+  bool irq_off_needed)
 {
unsigned long save_flags;
void *objp;
 
+   /* Compiler need to remove irq_off_needed branch statements */
+   BUILD_BUG_ON(!__builtin_constant_p(irq_off_needed));
+
flags &= gfp_allowed_mask;
 
lockdep_trace_alloc(flags);
@@ -3249,9 +3253,11 @@ slab_alloc(struct kmem_cache *cachep, gfp_t flags, 
unsigned long caller)
cachep = memcg_kmem_get_cache(cachep, flags);
 
cache_alloc_debugcheck_before(cachep, flags);
-   local_irq_save(save_flags);
+   if (irq_off_needed)
+   local_irq_save(save_flags);
objp = __do_cache_alloc(cachep, flags);
-   local_irq_restore(save_flags);
+   if (irq_off_needed)
+   local_irq_restore(save_flags);
objp = cache_alloc_debugcheck_after(cachep, flags, objp, caller);
kmemleak_alloc_recursive(objp, cachep->object_size, 1, cachep->flags,
 flags);
@@ -3407,7 +3413,7 @@ static inline void __cache_free(struct kmem_cache 
*cachep, void *objp,
  */
 void *kmem_cache_alloc(struct kmem_cache *cachep, gfp_t flags)
 {
-   void *ret = slab_alloc(cachep, flags, _RET_IP_);
+   void *ret = slab_alloc(cachep, flags, _RET_IP_, true);
 
trace_kmem_cache_alloc(_RET_IP_, ret,
   cachep->object_size, cachep->size, flags);
@@ -3416,16 +3422,23 @@ void *kmem_cache_alloc(struct kmem_cache *cachep, gfp_t 
flags)
 }
 EXPORT_SYMBOL(kmem_cache_alloc);
 
-void kmem_cache_free_bulk(struct kmem_cache *s, size_t size, void **p)
-{
-   __kmem_cache_free_bulk(s, size, p);
-}
-EXPORT_SYMBOL(kmem_cache_free_bulk);
-
+/* Note that interrupts must be enabled when calling this function. */
 bool kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
-   void **p)
+  

[v4.3.0-rc2->3] Regression: BIG networking performance loss

2015-09-29 Thread Jörg Otte
With kernels vmlinuz-4.3.0-rc2-00228-gd4a748a and earlier it is no
problem for me to stream HD-videos (700-800 Kbyte/s) from YouTube.

With the same video material and kernels
vmlinuz-4.3.0-rc2-00438-gd8cc397 and later I only reach 70-80 KByte/s.
That's a one-tenth than before.

The merges between 00228 -> 00438 are:
d8cc397 Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
c91d707 Merge git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
bcba282 Merge tag 'usb-4.3-rc3' of
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
fb740f9 Merge tag 'tty-4.3-rc3' of
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
b11e7b8 Merge tag 'staging-4.3-rc3' of
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
7c1efea Merge tag 'driver-core-4.3-rc3' of
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
64b796e Merge tag 'char-misc-4.3-rc3' of
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
518a7cb Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net

Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411
PCI Express Gigabit Ethernet Controller (rev 07)
Driver:r8169.

Thanks, Jörg
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 net-next 0/6] net: switchdev: use specific switchdev_obj_*

2015-09-29 Thread Vivien Didelot
This patchset changes switchdev add, del, dump operations from this:

int (*switchdev_port_obj_add)(struct net_device *dev,
  struct switchdev_obj *obj,
  struct switchdev_trans *trans);
int (*switchdev_port_obj_del)(struct net_device *dev,
  struct switchdev_obj *obj);
int (*switchdev_port_obj_dump)(struct net_device *dev,
  struct switchdev_obj *obj);

to something similar to the notifier_call callback of a notifier_block:

int (*switchdev_port_obj_add)(struct net_device *dev,
  enum switchdev_obj_id id,
  const void *obj,
  struct switchdev_trans *trans);   
  
int (*switchdev_port_obj_del)(struct net_device *dev,
  enum switchdev_obj_id id,
  const void *obj);
int (*switchdev_port_obj_dump)(struct net_device *dev,
   enum switchdev_obj_id id, void *obj,
   int (*cb)(void *obj));

This allows the caller to pass and expect back a specific switchdev_obj_*
structure (e.g. switchdev_obj_fdb) instead of the generic switchdev_obj one.

This will simplify pushing the callback function down to the drivers.

The first 3 patches get rid of the dev parameter of the dump callback, since it
is not always neeeded (e.g. vlan_dump) and some drivers (such as DSA drivers)
may not have easy access to it.

Patches 4 and 5 implement the change in the switchdev operations and its users.

Patch 6 extracts the inner switchdev_obj_* structures from switchdev_obj and
removes this last one.

v2: fix error spotted by kbuild (extra ';' inline switchdev_port_obj_dump).

Vivien Didelot (6):
  net: switchdev: remove dev in port_vlan_dump_put
  net: switchdev: move dev in switchdev_fdb_dump
  net: switchdev: remove dev from switchdev_obj cb
  net: switchdev: pass callback to dump operation
  net: switchdev: abstract object in add/del ops
  net: switchdev: extract struct switchdev_obj_*

 drivers/net/ethernet/rocker/rocker.c |  42 
 include/net/switchdev.h  |  80 ---
 net/bridge/br_fdb.c  |  11 +--
 net/bridge/br_vlan.c |  24 ++---
 net/dsa/slave.c  |  46 +
 net/switchdev/switchdev.c| 184 ---
 6 files changed, 186 insertions(+), 201 deletions(-)

-- 
2.6.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 net-next 3/6] net: switchdev: remove dev from switchdev_obj cb

2015-09-29 Thread Vivien Didelot
The net_device associated to a dump operation does not have to be passed
to the callback. switchdev stores it in a superset struct, if needed.

Also some drivers (such as DSA drivers) may not have easy access to it.

This will simplify pushing the callback function down to the drivers.

Signed-off-by: Vivien Didelot 
---
 drivers/net/ethernet/rocker/rocker.c | 4 ++--
 include/net/switchdev.h  | 2 +-
 net/dsa/slave.c  | 4 ++--
 net/switchdev/switchdev.c| 6 ++
 4 files changed, 7 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/rocker/rocker.c 
b/drivers/net/ethernet/rocker/rocker.c
index d3f6632..78fd443 100644
--- a/drivers/net/ethernet/rocker/rocker.c
+++ b/drivers/net/ethernet/rocker/rocker.c
@@ -4556,7 +4556,7 @@ static int rocker_port_fdb_dump(const struct rocker_port 
*rocker_port,
fdb->ndm_state = NUD_REACHABLE;
fdb->vid = rocker_port_vlan_to_vid(rocker_port,
   found->key.vlan_id);
-   err = obj->cb(rocker_port->dev, obj);
+   err = obj->cb(obj);
if (err)
break;
}
@@ -4579,7 +4579,7 @@ static int rocker_port_vlan_dump(const struct rocker_port 
*rocker_port,
if (rocker_vlan_id_is_internal(htons(vid)))
vlan->flags |= BRIDGE_VLAN_INFO_PVID;
vlan->vid_begin = vlan->vid_end = vid;
-   err = obj->cb(rocker_port->dev, obj);
+   err = obj->cb(obj);
if (err)
break;
}
diff --git a/include/net/switchdev.h b/include/net/switchdev.h
index 1820787..9ef7c56 100644
--- a/include/net/switchdev.h
+++ b/include/net/switchdev.h
@@ -66,7 +66,7 @@ enum switchdev_obj_id {
 
 struct switchdev_obj {
enum switchdev_obj_id id;
-   int (*cb)(struct net_device *dev, struct switchdev_obj *obj);
+   int (*cb)(struct switchdev_obj *obj);
union {
struct switchdev_obj_vlan { /* PORT_VLAN */
u16 flags;
diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index f18cae5..0b47647 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -334,7 +334,7 @@ static int dsa_slave_port_vlan_dump(struct net_device *dev,
if (test_bit(p->port, untagged))
vlan->flags |= BRIDGE_VLAN_INFO_UNTAGGED;
 
-   err = obj->cb(dev, obj);
+   err = obj->cb(obj);
if (err)
break;
}
@@ -397,7 +397,7 @@ static int dsa_slave_port_fdb_dump(struct net_device *dev,
obj->u.fdb.vid = vid;
obj->u.fdb.ndm_state = is_static ? NUD_NOARP : NUD_REACHABLE;
 
-   ret = obj->cb(dev, obj);
+   ret = obj->cb(obj);
if (ret < 0)
break;
}
diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
index c0e2047..93f4971 100644
--- a/net/switchdev/switchdev.c
+++ b/net/switchdev/switchdev.c
@@ -514,8 +514,7 @@ static int switchdev_port_vlan_dump_put(struct 
switchdev_vlan_dump *dump)
return 0;
 }
 
-static int switchdev_port_vlan_dump_cb(struct net_device *dev,
-  struct switchdev_obj *obj)
+static int switchdev_port_vlan_dump_cb(struct switchdev_obj *obj)
 {
struct switchdev_vlan_dump *dump =
container_of(obj, struct switchdev_vlan_dump, obj);
@@ -864,8 +863,7 @@ struct switchdev_fdb_dump {
int idx;
 };
 
-static int switchdev_port_fdb_dump_cb(struct net_device *dev,
- struct switchdev_obj *obj)
+static int switchdev_port_fdb_dump_cb(struct switchdev_obj *obj)
 {
struct switchdev_fdb_dump *dump =
container_of(obj, struct switchdev_fdb_dump, obj);
-- 
2.6.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next v2] net: Add support for filtering neigh dump by master device

2015-09-29 Thread David Ahern
Add support for filtering neighbor dumps by master device by adding
the NDA_MASTER attribute to the dump request. A new netlink flag,
NLM_F_DUMP_FILTERED, is added to indicate the kernel supports the
request and output is filtered as requested.

Signed-off-by: David Ahern 
---
v2
- added NLM_F_DUMP_FILTERED flag for userspace feedback that request is
  supported

This method works for other filters as well and other dump commands.
Works fine for all combinations of new and old kernel and new and old ip:
1. new ip command on old kernel, NDA_MASTER attribute is ignored
2. old ip command on new kernel, NDA_MASTER attribute is not present
3. new ip on new kernel ... goodness ensues by limiting data to
   only what user wants

 include/uapi/linux/netlink.h |  1 +
 net/core/neighbour.c | 32 +++-
 2 files changed, 32 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/netlink.h b/include/uapi/linux/netlink.h
index 6f3fe16cd22a..f095155d8749 100644
--- a/include/uapi/linux/netlink.h
+++ b/include/uapi/linux/netlink.h
@@ -54,6 +54,7 @@ struct nlmsghdr {
 #define NLM_F_ACK  4   /* Reply with ack, with zero or error 
code */
 #define NLM_F_ECHO 8   /* Echo this request*/
 #define NLM_F_DUMP_INTR16  /* Dump was inconsistent due to 
sequence change */
+#define NLM_F_DUMP_FILTERED32  /* Dump was filtered as requested */
 
 /* Modifiers to GET request */
 #define NLM_F_ROOT 0x100   /* specify tree root*/
diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index 2b515ba7e94f..8c57fdf4d68e 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -2235,14 +2235,42 @@ static void neigh_update_notify(struct neighbour *neigh)
__neigh_notify(neigh, RTM_NEWNEIGH, 0);
 }
 
+static bool neigh_master_filtered(struct net_device *dev, int master_idx)
+{
+   struct net_device *master;
+
+   if (!master_idx)
+   return false;
+
+   master = netdev_master_upper_dev_get(dev);
+   if (!master || master->ifindex != master_idx)
+   return true;
+
+   return false;
+}
+
 static int neigh_dump_table(struct neigh_table *tbl, struct sk_buff *skb,
struct netlink_callback *cb)
 {
struct net *net = sock_net(skb->sk);
+   const struct nlmsghdr *nlh = cb->nlh;
+   struct nlattr *tb[NDA_MAX + 1];
struct neighbour *n;
int rc, h, s_h = cb->args[1];
int idx, s_idx = idx = cb->args[2];
struct neigh_hash_table *nht;
+   int filter_master_idx = 0;
+   unsigned int flags = NLM_F_MULTI;
+   int err;
+
+   err = nlmsg_parse(nlh, sizeof(struct ndmsg), tb, NDA_MAX, NULL);
+   if (!err) {
+   if (tb[NDA_MASTER])
+   filter_master_idx = nla_get_u32(tb[NDA_MASTER]);
+
+   if (filter_master_idx)
+   flags |= NLM_F_DUMP_FILTERED;
+   }
 
rcu_read_lock_bh();
nht = rcu_dereference_bh(tbl->nht);
@@ -2255,12 +2283,14 @@ static int neigh_dump_table(struct neigh_table *tbl, 
struct sk_buff *skb,
 n = rcu_dereference_bh(n->next)) {
if (!net_eq(dev_net(n->dev), net))
continue;
+   if (neigh_master_filtered(n->dev, filter_master_idx))
+   continue;
if (idx < s_idx)
goto next;
if (neigh_fill_info(skb, n, NETLINK_CB(cb->skb).portid,
cb->nlh->nlmsg_seq,
RTM_NEWNEIGH,
-   NLM_F_MULTI) < 0) {
+   flags) < 0) {
rc = -1;
goto out;
}
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[MM PATCH V4 5/6] slub: support for bulk free with SLUB freelists

2015-09-29 Thread Jesper Dangaard Brouer
Make it possible to free a freelist with several objects by adjusting
API of slab_free() and __slab_free() to have head, tail and an objects
counter (cnt).

Tail being NULL indicate single object free of head object.  This
allow compiler inline constant propagation in slab_free() and
slab_free_freelist_hook() to avoid adding any overhead in case of
single object free.

This allows a freelist with several objects (all within the same
slab-page) to be free'ed using a single locked cmpxchg_double in
__slab_free() and with an unlocked cmpxchg_double in slab_free().

Object debugging on the free path is also extended to handle these
freelists.  When CONFIG_SLUB_DEBUG is enabled it will also detect if
objects don't belong to the same slab-page.

These changes are needed for the next patch to bulk free the detached
freelists it introduces and constructs.

Micro benchmarking showed no performance reduction due to this change,
when debugging is turned off (compiled with CONFIG_SLUB_DEBUG).

Signed-off-by: Jesper Dangaard Brouer 
Signed-off-by: Alexander Duyck 

---
V4:
 - Change API per req of Christoph Lameter
 - Remove comments in init_object.

 mm/slub.c |   87 -
 1 file changed, 69 insertions(+), 18 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 1cf98d89546d..7c2abc33fd4e 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1063,11 +1063,15 @@ bad:
return 0;
 }
 
+/* Supports checking bulk free of a constructed freelist */
 static noinline struct kmem_cache_node *free_debug_processing(
-   struct kmem_cache *s, struct page *page, void *object,
+   struct kmem_cache *s, struct page *page,
+   void *head, void *tail, int bulk_cnt,
unsigned long addr, unsigned long *flags)
 {
struct kmem_cache_node *n = get_node(s, page_to_nid(page));
+   void *object = head;
+   int cnt = 0;
 
spin_lock_irqsave(>list_lock, *flags);
slab_lock(page);
@@ -1075,6 +1079,9 @@ static noinline struct kmem_cache_node 
*free_debug_processing(
if (!check_slab(s, page))
goto fail;
 
+next_object:
+   cnt++;
+
if (!check_valid_pointer(s, page, object)) {
slab_err(s, page, "Invalid object pointer 0x%p", object);
goto fail;
@@ -1105,8 +1112,19 @@ static noinline struct kmem_cache_node 
*free_debug_processing(
if (s->flags & SLAB_STORE_USER)
set_track(s, object, TRACK_FREE, addr);
trace(s, page, object, 0);
+   /* Freepointer not overwritten by init_object(), SLAB_POISON moved it */
init_object(s, object, SLUB_RED_INACTIVE);
+
+   /* Reached end of constructed freelist yet? */
+   if (object != tail) {
+   object = get_freepointer(s, object);
+   goto next_object;
+   }
 out:
+   if (cnt != bulk_cnt)
+   slab_err(s, page, "Bulk freelist count(%d) invalid(%d)\n",
+bulk_cnt, cnt);
+
slab_unlock(page);
/*
 * Keep node_lock to preserve integrity
@@ -1210,7 +1228,8 @@ static inline int alloc_debug_processing(struct 
kmem_cache *s,
struct page *page, void *object, unsigned long addr) { return 0; }
 
 static inline struct kmem_cache_node *free_debug_processing(
-   struct kmem_cache *s, struct page *page, void *object,
+   struct kmem_cache *s, struct page *page,
+   void *head, void *tail, int bulk_cnt,
unsigned long addr, unsigned long *flags) { return NULL; }
 
 static inline int slab_pad_check(struct kmem_cache *s, struct page *page)
@@ -1306,6 +1325,31 @@ static inline void slab_free_hook(struct kmem_cache *s, 
void *x)
kasan_slab_free(s, x);
 }
 
+/* Compiler cannot detect that slab_free_freelist_hook() can be
+ * removed if slab_free_hook() evaluates to nothing.  Thus, we need to
+ * catch all relevant config debug options here.
+ */
+#if defined(CONFIG_KMEMCHECK) ||   \
+   defined(CONFIG_LOCKDEP) ||  \
+   defined(CONFIG_DEBUG_KMEMLEAK) ||   \
+   defined(CONFIG_DEBUG_OBJECTS_FREE) ||   \
+   defined(CONFIG_KASAN)
+static inline void slab_free_freelist_hook(struct kmem_cache *s,
+  void *head, void *tail)
+{
+   void *object = head;
+   void *tail_obj = tail ? : head;
+
+   do {
+   slab_free_hook(s, object);
+   } while ((object != tail_obj) &&
+(object = get_freepointer(s, object)));
+}
+#else
+static inline void slab_free_freelist_hook(struct kmem_cache *s, void 
*obj_tail,
+  void *freelist_head) {}
+#endif
+
 static void setup_object(struct kmem_cache *s, struct page *page,
void *object)
 {
@@ -2586,10 +2630,11 @@ EXPORT_SYMBOL(kmem_cache_alloc_node_trace);
  * handling required then we can return immediately.
  */
 static void 

[MM PATCH V4 2/6] slub: Avoid irqoff/on in bulk allocation

2015-09-29 Thread Jesper Dangaard Brouer
From: Christoph Lameter 

NOTICE: Accepted by AKPM
 
http://ozlabs.org/~akpm/mmots/broken-out/slub-avoid-irqoff-on-in-bulk-allocation.patch

Use the new function that can do allocation while
interrupts are disabled.  Avoids irq on/off sequences.

Signed-off-by: Christoph Lameter 
Signed-off-by: Jesper Dangaard Brouer 
---
 mm/slub.c |   24 +++-
 1 file changed, 11 insertions(+), 13 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 02cfb3a5983e..024eed32da2c 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2821,30 +2821,23 @@ bool kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t 
flags, size_t size,
void *object = c->freelist;
 
if (unlikely(!object)) {
-   local_irq_enable();
/*
 * Invoking slow path likely have side-effect
 * of re-populating per CPU c->freelist
 */
-   p[i] = __slab_alloc(s, flags, NUMA_NO_NODE,
+   p[i] = ___slab_alloc(s, flags, NUMA_NO_NODE,
_RET_IP_, c);
-   if (unlikely(!p[i])) {
-   __kmem_cache_free_bulk(s, i, p);
-   return false;
-   }
-   local_irq_disable();
+   if (unlikely(!p[i]))
+   goto error;
+
c = this_cpu_ptr(s->cpu_slab);
continue; /* goto for-loop */
}
 
/* kmem_cache debug support */
s = slab_pre_alloc_hook(s, flags);
-   if (unlikely(!s)) {
-   __kmem_cache_free_bulk(s, i, p);
-   c->tid = next_tid(c->tid);
-   local_irq_enable();
-   return false;
-   }
+   if (unlikely(!s))
+   goto error;
 
c->freelist = get_freepointer(s, object);
p[i] = object;
@@ -2864,6 +2857,11 @@ bool kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t 
flags, size_t size,
}
 
return true;
+
+error:
+   __kmem_cache_free_bulk(s, i, p);
+   local_irq_enable();
+   return false;
 }
 EXPORT_SYMBOL(kmem_cache_alloc_bulk);
 

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[MM PATCH V4 1/6] slub: create new ___slab_alloc function that can be called with irqs disabled

2015-09-29 Thread Jesper Dangaard Brouer
From: Christoph Lameter 

NOTICE: Accepted by AKPM
 
http://ozlabs.org/~akpm/mmots/broken-out/slub-create-new-___slab_alloc-function-that-can-be-called-with-irqs-disabled.patch

Bulk alloc needs a function like that because it enables interrupts before
calling __slab_alloc which promptly disables them again using the expensive
local_irq_save().

Signed-off-by: Christoph Lameter 
Signed-off-by: Jesper Dangaard Brouer 
---
 mm/slub.c |   44 +---
 1 file changed, 29 insertions(+), 15 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index f614b5dc396b..02cfb3a5983e 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2298,23 +2298,15 @@ static inline void *get_freelist(struct kmem_cache *s, 
struct page *page)
  * And if we were unable to get a new slab from the partial slab lists then
  * we need to allocate a new slab. This is the slowest path since it involves
  * a call to the page allocator and the setup of a new slab.
+ *
+ * Version of __slab_alloc to use when we know that interrupts are
+ * already disabled (which is the case for bulk allocation).
  */
-static void *__slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
+static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
  unsigned long addr, struct kmem_cache_cpu *c)
 {
void *freelist;
struct page *page;
-   unsigned long flags;
-
-   local_irq_save(flags);
-#ifdef CONFIG_PREEMPT
-   /*
-* We may have been preempted and rescheduled on a different
-* cpu before disabling interrupts. Need to reload cpu area
-* pointer.
-*/
-   c = this_cpu_ptr(s->cpu_slab);
-#endif
 
page = c->page;
if (!page)
@@ -2372,7 +2364,6 @@ load_freelist:
VM_BUG_ON(!c->page->frozen);
c->freelist = get_freepointer(s, freelist);
c->tid = next_tid(c->tid);
-   local_irq_restore(flags);
return freelist;
 
 new_slab:
@@ -2389,7 +2380,6 @@ new_slab:
 
if (unlikely(!freelist)) {
slab_out_of_memory(s, gfpflags, node);
-   local_irq_restore(flags);
return NULL;
}
 
@@ -2405,11 +2395,35 @@ new_slab:
deactivate_slab(s, page, get_freepointer(s, freelist));
c->page = NULL;
c->freelist = NULL;
-   local_irq_restore(flags);
return freelist;
 }
 
 /*
+ * Another one that disabled interrupt and compensates for possible
+ * cpu changes by refetching the per cpu area pointer.
+ */
+static void *__slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
+ unsigned long addr, struct kmem_cache_cpu *c)
+{
+   void *p;
+   unsigned long flags;
+
+   local_irq_save(flags);
+#ifdef CONFIG_PREEMPT
+   /*
+* We may have been preempted and rescheduled on a different
+* cpu before disabling interrupts. Need to reload cpu area
+* pointer.
+*/
+   c = this_cpu_ptr(s->cpu_slab);
+#endif
+
+   p = ___slab_alloc(s, gfpflags, node, addr, c);
+   local_irq_restore(flags);
+   return p;
+}
+
+/*
  * Inlined fastpath so that allocation functions (kmalloc, kmem_cache_alloc)
  * have the fastpath folded into their functions. So no function call
  * overhead for requests that can be satisfied on the fastpath.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH bluetooth-next 1/4] netlink: add nla_get for le32 and le64

2015-09-29 Thread Marcel Holtmann
Hi Dave,

> This patch adds missing inline wrappers for nla_get_le32 and
> nla_get_le64. The 802.15.4 MAC byteorder is little endian and we keep
> the byteorder for fields like address configuration in the same
> byteorder as it comes from the MAC layer.
> 
> To provide these fields for nl802154 userspace applications, we need
> these inline wrappers for netlink.
> 
> Cc: David S. Miller 
> Signed-off-by: Alexander Aring 
> ---
> include/net/netlink.h | 18 ++
> 1 file changed, 18 insertions(+)
> 
> diff --git a/include/net/netlink.h b/include/net/netlink.h
> index 2a5dbcc..0e31727 100644
> --- a/include/net/netlink.h
> +++ b/include/net/netlink.h
> @@ -1004,6 +1004,15 @@ static inline __be32 nla_get_be32(const struct nlattr 
> *nla)
> }
> 
> /**
> + * nla_get_le32 - return payload of __le32 attribute
> + * @nla: __le32 netlink attribute
> + */
> +static inline __le32 nla_get_le32(const struct nlattr *nla)
> +{
> + return *(__le32 *) nla_data(nla);
> +}
> +
> +/**
>  * nla_get_u16 - return payload of u16 attribute
>  * @nla: u16 netlink attribute
>  */
> @@ -1066,6 +1075,15 @@ static inline __be64 nla_get_be64(const struct nlattr 
> *nla)
> }
> 
> /**
> + * nla_get_le64 - return payload of __le64 attribute
> + * @nla: __le64 netlink attribute
> + */
> +static inline __le64 nla_get_le64(const struct nlattr *nla)
> +{
> + return *(__le64 *) nla_data(nla);
> +}
> +
> +/**
>  * nla_get_s32 - return payload of s32 attribute
>  * @nla: s32 netlink attribute
>  */

do you have any objections to me taking this change through the bluetooth-next 
tree?

Regards

Marcel

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] net: ipv6: Add RT6_LOOKUP_F_IFACE flag if oif is set

2015-09-29 Thread David Miller
From: David Ahern 
Date: Mon, 28 Sep 2015 10:12:13 -0700

> Wolfgang reported that IPv6 stack is ignoring oif in output route lookups:
 ...
> The stack does consider the oif but a mismatch in rt6_device_match is not
> considered fatal because RT6_LOOKUP_F_IFACE is not set in the flags.
> 
> Cc: Wolfgang Nothdurft 
> Signed-off-by: David Ahern 

Applied, thank you.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RESEND: [PATCH v3 net-next] sky2: use random address if EEPROM is bad

2015-09-29 Thread David Miller
From: Liviu Dudau 
Date: Mon, 28 Sep 2015 17:51:51 +0100

> On some embedded systems the EEPROM does not contain a valid MAC address.
> In that case it is better to fallback to a generated mac address and
> let init scripts fix the value later.
> 
> Reported-by: Liviu Dudau 
> Signed-off-by: Stephen Hemminger 
> [Changed handcoded setup to use eth_hw_addr_random() and to save new address 
> into HW]
> Signed-off-by: Liviu Dudau 

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/3] use napi_schedule_irqoff()

2015-09-29 Thread Alexander Duyck
This patch set is meant to replace the calls to napi_schedule with
napi_schedule_irqoff as this should help to reduce the interrupt overhead
slightly by removing the unneeded call to local_irq_save and
local_irq_restore.

---

Alexander Duyck (3):
  ixgbe/ixgbevf: use napi_schedule_irqoff()
  i40e/i40evf: use napi_schedule_irqoff()
  fm10k: use napi_schedule_irqoff()


 drivers/net/ethernet/intel/fm10k/fm10k_pci.c  |2 +-
 drivers/net/ethernet/intel/i40e/i40e_main.c   |6 --
 drivers/net/ethernet/intel/i40evf/i40evf_main.c   |2 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |4 ++--
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c |2 +-
 5 files changed, 9 insertions(+), 7 deletions(-)

--
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] i40e/i40evf: use napi_schedule_irqoff()

2015-09-29 Thread Alexander Duyck
The i40e_intr and i40e/i40evf_msix_clean_rings functions run from hard
interrupt context or with interrupts already disabled in netpoll.

They can use napi_schedule_irqoff() instead of napi_schedule()

Signed-off-by: Alexander Duyck 
---
 drivers/net/ethernet/intel/i40e/i40e_main.c |6 --
 drivers/net/ethernet/intel/i40evf/i40evf_main.c |2 +-
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 484226e0365d..3cc97d4f5f70 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -3281,7 +3281,7 @@ static irqreturn_t i40e_msix_clean_rings(int irq, void 
*data)
if (!q_vector->tx.ring && !q_vector->rx.ring)
return IRQ_HANDLED;
 
-   napi_schedule(_vector->napi);
+   napi_schedule_irqoff(_vector->napi);
 
return IRQ_HANDLED;
 }
@@ -3450,6 +3450,8 @@ static irqreturn_t i40e_intr(int irq, void *data)
 
/* only q0 is used in MSI/Legacy mode, and none are used in MSIX */
if (icr0 & I40E_PFINT_ICR0_QUEUE_0_MASK) {
+   struct i40e_vsi *vsi = pf->vsi[pf->lan_vsi];
+   struct i40e_q_vector *q_vector = vsi->q_vectors[0];
 
/* temporarily disable queue cause for NAPI processing */
u32 qval = rd32(hw, I40E_QINT_RQCTL(0));
@@ -3462,7 +3464,7 @@ static irqreturn_t i40e_intr(int irq, void *data)
wr32(hw, I40E_QINT_TQCTL(0), qval);
 
if (!test_bit(__I40E_DOWN, >state))
-   
napi_schedule(>vsi[pf->lan_vsi]->q_vectors[0]->napi);
+   napi_schedule_irqoff(_vector->napi);
}
 
if (icr0 & I40E_PFINT_ICR0_ADMINQ_MASK) {
diff --git a/drivers/net/ethernet/intel/i40evf/i40evf_main.c 
b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
index 5e1336321c2f..4b3db099f58c 100644
--- a/drivers/net/ethernet/intel/i40evf/i40evf_main.c
+++ b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
@@ -334,7 +334,7 @@ static irqreturn_t i40evf_msix_clean_rings(int irq, void 
*data)
if (!q_vector->tx.ring && !q_vector->rx.ring)
return IRQ_HANDLED;
 
-   napi_schedule(_vector->napi);
+   napi_schedule_irqoff(_vector->napi);
 
return IRQ_HANDLED;
 }

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] fm10k: use napi_schedule_irqoff()

2015-09-29 Thread Alexander Duyck
The fm10k_msix_clean_rings function runs from hard interrupt context or
with interrupts already disabled in netpoll.

It can use napi_schedule_irqoff() instead of napi_schedule()

Signed-off-by: Alexander Duyck 
---
 drivers/net/ethernet/intel/fm10k/fm10k_pci.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c 
b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
index 74be792f3f1b..5fbffbaefe32 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
@@ -846,7 +846,7 @@ static irqreturn_t fm10k_msix_clean_rings(int 
__always_unused irq, void *data)
struct fm10k_q_vector *q_vector = data;
 
if (q_vector->rx.count || q_vector->tx.count)
-   napi_schedule(_vector->napi);
+   napi_schedule_irqoff(_vector->napi);
 
return IRQ_HANDLED;
 }

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net] skbuff: Fix skb checksum partial check.

2015-09-29 Thread David Miller
From: Pravin B Shelar 
Date: Mon, 28 Sep 2015 17:24:25 -0700

> Earlier patch 6ae459bda tried to detect void ckecksum partial
> skb by comparing pull length to checksum offset. But it does
> not work for all cases since checksum-offset depends on
> updates to skb->data.
> 
> Following patch fixes it by validating checksum start offset
> after skb-data pointer is updated. Negative value of checksum
> offset start means there is no need to checksum.
> 
> Fixes: 6ae459bda ("skbuff: Fix skb checksum flag on skb pull")
> Reported-by: Andrew Vagin 
> Signed-off-by: Pravin B Shelar 
> ---
> This and 6ae459bda patches needs to be backported to stable.

Applied and both queued up for -stable, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next v2 00/11] net: L3 master device

2015-09-29 Thread David Ahern

On 9/29/15 5:23 PM, David Miller wrote:

From: David Ahern 
Date: Mon, 28 Sep 2015 10:16:50 -0700


v2
- rebased to top of net-next

- addressed Niks comments (checking master, removing extra lines, and
   flipping the order of patches 1 and 2)


This still needs some work:

ERROR: "l3mdev_master_ifindex_rcu" [net/ipv6/ipv6.ko] undefined!
scripts/Makefile.modpost:90: recipe for target '__modpost' failed
make[1]: *** [__modpost] Error 1
Makefile:1095: recipe for target 'modules' failed
make: *** [modules] Error 2



ugh. All of my builds have CONFIG_IPV6=y. Will kickout a v3 later.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] 802.1AD: Flow handling, actions, vlan parsing and netlink attributes

2015-09-29 Thread Pravin Shelar
On Fri, Sep 25, 2015 at 3:35 PM, Thomas F Herbert
 wrote:
> Pravin,
>
> Another comment and question. Please seen inline below.
>
> Thanks,
>
> --Tom
>
> On 9/24/15 7:42 PM, Pravin Shelar wrote:
>>
>> On Thu, Sep 24, 2015 at 10:58 AM, Thomas F Herbert
>>  wrote:
>>>
>>> Add support for 802.1ad including the ability to push and pop double
>>> tagged vlans. Add support for 802.1ad to netlink parsing and flow
>>> conversion. Uses double nested encap attributes to represent double
>>> tagged vlan. Inner TPID encoded along with ctci in nested attributes.
>>>
>>> Signed-off-by: Thomas F Herbert 
>>> ---
>>>   net/openvswitch/flow.c |  83 +
>>>   net/openvswitch/flow.h |   5 ++
>>>   net/openvswitch/flow_netlink.c | 166
>>> ++---
>>>   3 files changed, 230 insertions(+), 24 deletions(-)
>>>
...

>>> @@ -1320,6 +1437,7 @@ static int __ovs_nla_put_key(const struct
>>> sw_flow_key *swkey,
>>>   {
>>>  struct ovs_key_ethernet *eth_key;
>>>  struct nlattr *nla, *encap;
>>> +   struct nlattr *in_encap = NULL;
>>>
>>>  if (nla_put_u32(skb, OVS_KEY_ATTR_RECIRC_ID, output->recirc_id))
>>>  goto nla_put_failure;
>>> @@ -1368,17 +1486,42 @@ static int __ovs_nla_put_key(const struct
>>> sw_flow_key *swkey,
>>>  ether_addr_copy(eth_key->eth_src, output->eth.src);
>>>  ether_addr_copy(eth_key->eth_dst, output->eth.dst);
>>>
>>> -   if (swkey->eth.tci || swkey->eth.type == htons(ETH_P_8021Q)) {
>>> +   if (swkey->eth.tci || eth_type_vlan(swkey->eth.type)) {
>>>  __be16 eth_type;
>>> -   eth_type = !is_mask ? htons(ETH_P_8021Q) : htons(0x);
>>> +
>>> +   if (swkey->eth.cvlan.ctci ||
>>> +   eth_type_vlan(swkey->eth.cvlan.c_tpid))
>>> +   eth_type = !is_mask ? htons(ETH_P_8021AD) :
>>> + htons(0x);
>>> +   else
>>> +   eth_type = !is_mask ? htons(ETH_P_8021Q) :
>>> + htons(0x);
>>> +
>>
>> Here we can directly dump output->eth.type to netlink. No need to
>> check for inner encap.
>
> The eth.type is set to the inner encapsulated protocol not to the tpid. We
> don't "know" what the outer tpid so I assume it is 802.1Q. To address this
> situation, do you think I should add the outer tpid to sw_flow_key?
> Also see comment above in flow.h.
>

With the addition of nested vlan, we need to add outer tpid. This will
simplify vlan netlink serialization too.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Poor IPv6 TCP performance in 4.3-rc3

2015-09-29 Thread Russell King - ARM Linux
Hi,

I'm seeing really poor IPv6 performance compared to IPv4.  I've
checked using two different ARM platforms - an iMX6 platform using
the FEC driver, and an Armada 38x using mvneta.

The following was captured using iperf between the target system
and my laptop.  The problem only occurs one-way.  The 4.3-rc3
platform is running iperf in server mode, the laptop is in client
mode.

Armada 38x:
ipv6: [  4]  0.0-23.9 sec   170 KBytes  58.3 Kbits/sec
ipv4: [  4]  0.0-10.0 sec  1.09 GBytes   936 Mbits/sec

iMX6Q:
ipv6: [  4]  0.0-11.1 sec   640 KBytes   474 Kbits/sec
ipv4: [  4]  0.0-10.0 sec   655 MBytes   549 Mbits/sec

iMX6D with 4.2:
ipv6: [  4]  0.0-10.0 sec   685 MBytes   574 Mbits/sec
ipv4: [  4]  0.0-10.0 sec   696 MBytes   583 Mbits/sec

It looks like there's an IPv6 regression between 4.2 and 4.3-rc3.

Turning GRO off on Armada 38x gives:
ipv6: [  4]  0.0-10.0 sec  1.08 GBytes   923 Mbits/sec
ipv4: [  5]  0.0-10.0 sec  1.09 GBytes   936 Mbits/sec

I haven't started to debug yet, but I thought I'd post a heads-up in
case it's a known problem.  I'll try to get some packet logs on
Thursday, and I'll try to bisect.

-- 
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] bridge: vlan: add per-vlan struct and move to rhashtables

2015-09-29 Thread David Miller
From: Nikolay Aleksandrov 
Date: Fri, 25 Sep 2015 19:00:11 +0200

> This patch changes the bridge vlan implementation to use rhashtables
> instead of bitmaps.

This seems to be taking the code in a good direction, and I'm kinda
happy to see more rhashtable users in the tree as well.

So, applied to net-next, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/1] net sysfs: Print link speed as signed integer

2015-09-29 Thread David Miller
From: Alexander Stein 
Date: Mon, 28 Sep 2015 15:05:33 +0200

> Otherwise 4294967295 (MBit/s) (-1) will be printed when there is no link.
> Documentation/ABI/testing/sysfs-class-net does not state if this shall be
> signed or unsigned.
> Also remove the now unused variable fmt_udec.
> 
> Signed-off-by: Alexander Stein 

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 0/6] ila: Optimization to preserve value of early demux

2015-09-29 Thread Tom Herbert
In the current implementation of ILA, LWT is used to perform
translation on both the input and output paths. This is functional,
however there is a big performance hit in the receive path. Early
demux occurs before the routing lookup (a hit actually obviates the
route lookup). Therefore the stack currently performs early
demux before translation so that a local connection with ILA
addresses is never matched. Note that this issue is not just
with ILA, but pretty much any translated or encapsulated packet
handled by LWT would miss the opportunity for early demux. Solving
the general problem seems non trivial since we would need to move
the route lookup before early demx thereby mitigating the value.

This patch set addresses the issue for ILA by adding a fast locator
lookup that occurs before early demux. This is done by creating an
XFRM hook to perform address translation early in the receive path.
For the backend we implement an rhashtable that contains identifier
to locator to mappings. The table also allows more specific matches
that include original locator and interface.

This patch set:
 - Add an rhashtable function to atomically replace and element.
   This is useful to implement sub-trees from a table entry
   without needing to use a special anchor structure as the
   table entry.
 - Add a start callback for starting a netlink dump.
 - Creates an ila directory under net/ipv6 and moves ila.c to it.
   ila.c is split into ila_common.c and ila_lwt.c.
 - Implement a table to do identifier->locator mapping. This is
   an rhashtable.
 - Configuration for the table with netlink.
 - Add XFRM xlat_addr facility. This includes a callback registeration
   function and hook to call registered callbacks.
 - Call xfrm6_xlat_addr from ipv6_rcv before NF_HOOK and routing.

Testing:
   Running 200 netperf TCP_RR streams

No ILA, baseline
   85.72% CPU utilization
   1861945 tps
   93/163/330 50/90/99% latencies

ILA before fix (LWT on both input and output)
   83.47 CPU utilization
   16583186 tps (-11% from baseline)
   107/183/338 50/90/99% latencies

ILA after fix (hook for input)
   84.97% CPU utilization
   1833948 tps (-1.5% from baseline)
   95/164/331 50/90/99% latencies

Hacked DNPT to do ILA
   80.94% CPU utilization
   1683315 tps (-10% from baseline)
   104/179/350 50/90/99% latencies

Tom Herbert (6):
  ila: Create net/ipv6/ila directory
  rhashtable: add function to replace an element
  netlink: add a start callback for starting a netlink dump
  xfrm: Add xfrm6 address translation function
  ipv6: Call xfrm6_xlat_addr from ipv6_rcv
  ila: Add support for xfrm6_xlat_addr

 include/linux/netlink.h|   2 +
 include/linux/rhashtable.h |  82 ++
 include/net/genetlink.h|   2 +
 include/net/xfrm.h |  25 ++
 include/uapi/linux/ila.h   |  22 ++
 net/ipv6/Kconfig   |   5 +
 net/ipv6/Makefile  |   3 +-
 net/ipv6/ila.c | 229 
 net/ipv6/ila/Makefile  |   7 +
 net/ipv6/ila/ila.h |  48 
 net/ipv6/ila/ila_common.c  | 103 
 net/ipv6/ila/ila_lwt.c | 152 +++
 net/ipv6/ila/ila_xlat.c| 642 +
 net/ipv6/ip6_input.c   |   3 +
 net/ipv6/xfrm6_policy.c|   7 +
 net/ipv6/xfrm6_xlat_addr.c |  66 +
 net/netlink/af_netlink.c   |   4 +
 net/netlink/genetlink.c|  16 ++
 18 files changed, 1188 insertions(+), 230 deletions(-)
 delete mode 100644 net/ipv6/ila.c
 create mode 100644 net/ipv6/ila/Makefile
 create mode 100644 net/ipv6/ila/ila.h
 create mode 100644 net/ipv6/ila/ila_common.c
 create mode 100644 net/ipv6/ila/ila_lwt.c
 create mode 100644 net/ipv6/ila/ila_xlat.c
 create mode 100644 net/ipv6/xfrm6_xlat_addr.c

-- 
2.4.6

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 2/6] rhashtable: add function to replace an element

2015-09-29 Thread Tom Herbert
Add the rhashtable_replace_fast function. This replaces one object in
the table with another atomically. The hashes of the new and old objects
must be equal.

Signed-off-by: Tom Herbert 
---
 include/linux/rhashtable.h | 82 ++
 1 file changed, 82 insertions(+)

diff --git a/include/linux/rhashtable.h b/include/linux/rhashtable.h
index 843ceca..77deece 100644
--- a/include/linux/rhashtable.h
+++ b/include/linux/rhashtable.h
@@ -819,4 +819,86 @@ out:
return err;
 }
 
+/* Internal function, please use rhashtable_replace_fast() instead */
+static inline int __rhashtable_replace_fast(
+   struct rhashtable *ht, struct bucket_table *tbl,
+   struct rhash_head *obj_old, struct rhash_head *obj_new,
+   const struct rhashtable_params params)
+{
+   struct rhash_head __rcu **pprev;
+   struct rhash_head *he;
+   spinlock_t *lock;
+   unsigned int hash;
+   int err = -ENOENT;
+
+   /* Minimally, the old and new objects must have same hash
+* (which should mean identifiers are the same).
+*/
+   hash = rht_head_hashfn(ht, tbl, obj_old, params);
+   if (hash != rht_head_hashfn(ht, tbl, obj_new, params))
+   return -EINVAL;
+
+   lock = rht_bucket_lock(tbl, hash);
+
+   spin_lock_bh(lock);
+
+   pprev = >buckets[hash];
+   rht_for_each(he, tbl, hash) {
+   if (he != obj_old) {
+   pprev = >next;
+   continue;
+   }
+
+   rcu_assign_pointer(obj_new->next, obj_old->next);
+   rcu_assign_pointer(*pprev, obj_new);
+   err = 0;
+   break;
+   }
+
+   spin_unlock_bh(lock);
+
+   return err;
+}
+
+/**
+ * rhashtable_replace_fast - replace an object in hash table
+ * @ht:hash table
+ * @obj_old:   pointer to hash head inside object being replaced
+ * @obj_new:   pointer to hash head inside object which is new
+ * @params:hash table parameters
+ *
+ * Replacing an object doesn't affect the number of elements in the hash table
+ * or bucket, so we don't need to worry about shrinking or expanding the
+ * table here.
+ *
+ * Returns zero on success, -ENOENT if the entry could not be found,
+ * -EINVAL if hash is not the same for the old and new objects.
+ */
+static inline int rhashtable_replace_fast(
+   struct rhashtable *ht, struct rhash_head *obj_old,
+   struct rhash_head *obj_new,
+   const struct rhashtable_params params)
+{
+   struct bucket_table *tbl;
+   int err;
+
+   rcu_read_lock();
+
+   tbl = rht_dereference_rcu(ht->tbl, ht);
+
+   /* Because we have already taken (and released) the bucket
+* lock in old_tbl, if we find that future_tbl is not yet
+* visible then that guarantees the entry to still be in
+* the old tbl if it exists.
+*/
+   while ((err = __rhashtable_replace_fast(ht, tbl, obj_old,
+   obj_new, params)) &&
+  (tbl = rht_dereference_rcu(tbl->future_tbl, ht)))
+   ;
+
+   rcu_read_unlock();
+
+   return err;
+}
+
 #endif /* _LINUX_RHASHTABLE_H */
-- 
2.4.6

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   3   >