Re: [PATCH net v2] af-unix: fix use-after-free with concurrent readers while splicing

2015-11-10 Thread Hannes Frederic Sowa
On Tue, Nov 10, 2015, at 16:18, Eric Dumazet wrote:
> Please Hannes include the Fixes: tag.

Yep, sorry, is done in v3.

> As you might already know, patchwork does not catch it later
> 
> Fixes: 2b514574f7e8 ("net: af_unix: implement splice for stream af_unix
> sockets")
> Acked-by: Eric Dumazet 
> 
> Also I would prefer skb_get() being on a separate line, to ease future
> understanding of the code.

Okay, indeed it is more visible. I changed the code, thanks!

Bye,
Hannes
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 0/6] net: dsa: mv88e6060: cleanup and fix setup

2015-11-10 Thread Neil Armstrong
This patchset introduces some fixes and a registers addressing cleanup for
the mv88e6060 DSA driver.

The first patch removes the poll_link as mv88e6xxx.
The 3 following patches fixes the setup in regards of the datasheet.
The 2 last patches introduces a clean header and replaces all magic values.

v2: cleanup InitReady patch, add missing Acked-by and fix header copyright 
notice

Neil Armstrong (6):
  net: dsa: mv88e6060: remove poll_link callback
  net: dsa: mv88e6060: use the correct InitReady bit
  net: dsa: mv88e6060: use the correct MaxFrameSize bit
  net: dsa: mv88e6060: use the correct bit shift for mac0
  net: dsa: mv88e6060: add register defines header file
  net: dsa: mv88e6060: replace magic values with register defines

 drivers/net/dsa/mv88e6060.c | 114 +++-
 drivers/net/dsa/mv88e6060.h | 111 ++
 2 files changed, 149 insertions(+), 76 deletions(-)
 create mode 100644 drivers/net/dsa/mv88e6060.h

-- 
1.9.1
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 2/6] net: dsa: mv88e6060: use the correct InitReady bit

2015-11-10 Thread Neil Armstrong
According to the mv88e6060 datasheet, the InitReady bit position
is 11 and the polarity is inverted.
Use the bit correctly to detect the end of initialization.

Acked-by: Andrew Lunn 
Signed-off-by: Neil Armstrong 
---
 drivers/net/dsa/mv88e6060.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/dsa/mv88e6060.c b/drivers/net/dsa/mv88e6060.c
index 6885ef5..eff5e18 100644
--- a/drivers/net/dsa/mv88e6060.c
+++ b/drivers/net/dsa/mv88e6060.c
@@ -102,7 +102,7 @@ static int mv88e6060_switch_reset(struct dsa_switch *ds)
timeout = jiffies + 1 * HZ;
while (time_before(jiffies, timeout)) {
ret = REG_READ(REG_GLOBAL, 0x00);
-   if ((ret & 0x8000) == 0x)
+   if (ret & 0x800)
break;

usleep_range(1000, 2000);
-- 
1.9.1
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 0/6] net: dsa: mv88e6060: cleanup and fix setup

2015-11-10 Thread Andrew Lunn
On Tue, Nov 10, 2015 at 04:51:09PM +0100, Neil Armstrong wrote:
> This patchset introduces some fixes and a registers addressing cleanup for
> the mv88e6060 DSA driver.

Hi Neil

It is normal for netdev to put into the email subject of patches which
tree these patches are for. "net" would be the latest -rcX and is for
fixes only. "net-next" would be for new work aimed at the next merge
window.

So long as Dave does not complain, leave them as they are now. But
please try to follow this for your next patches.

Thanks
   Andrew
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] mac80211: Remove kerneldoc for beacon_loss_count

2015-11-10 Thread Thierry Reding
From: Thierry Reding <tred...@nvidia.com>

The beacon_loss_count field was removed from the structure in commit
976bd9efdae6 ("mac80211: move beacon_loss_count into ifmgd"). This
updates the kerneldoc comment to match the structure definition.

Signed-off-by: Thierry Reding <tred...@nvidia.com>
---
Applies on top of next-20151110.

 net/mac80211/sta_info.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/net/mac80211/sta_info.h b/net/mac80211/sta_info.h
index 2cafb21b422f..fb77ece9a8c8 100644
--- a/net/mac80211/sta_info.h
+++ b/net/mac80211/sta_info.h
@@ -370,7 +370,6 @@ DECLARE_EWMA(signal, 1024, 8)
  * @uploaded: set to true when sta is uploaded to the driver
  * @sta: station information we share with the driver
  * @sta_state: duplicates information about station state (for debug)
- * @beacon_loss_count: number of times beacon loss has triggered
  * @rcu_head: RCU head used for freeing this station struct
  * @cur_max_bandwidth: maximum bandwidth to use for TX to the station,
  * taken from HT/VHT capabilities or VHT operating mode notification
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/6] net: dsa: mv88e6060: add register defines header file

2015-11-10 Thread Vivien Didelot
On Nov. Tuesday 10 (46) 03:42 PM, Neil Armstrong wrote:
> On 11/10/2015 03:25 PM, Vivien Didelot wrote:
> > Hi Neil,
> > 
> > On Nov. Tuesday 10 (46) 02:25 PM, Neil Armstrong wrote:
> >> To align with the mv88e6xxx code, add a similar header file
> >> with all the register defines.
> >> The file is based on the mv88e6xxx header for coherency.
> >>
> >> Signed-off-by: Neil Armstrong 
> > 
> > In the RFC patchset, Andrew mentioned that there is not that much things in
> > common with mv88e6xxx, so I don't really see a value to add a separate 
> > header
> > file. Would that make sense to you guys to add the defines directly in
> > mv88e6060.c and squash that in the last patch?
> > 
> >> ---
> >>  drivers/net/dsa/mv88e6060.h | 108 
> >> 
> >>  1 file changed, 108 insertions(+)
> >>  create mode 100644 drivers/net/dsa/mv88e6060.h
> >>
> >> diff --git a/drivers/net/dsa/mv88e6060.h b/drivers/net/dsa/mv88e6060.h
> >> new file mode 100644
> >> index 000..adbc894
> >> --- /dev/null
> >> +++ b/drivers/net/dsa/mv88e6060.h
> >> @@ -0,0 +1,108 @@
> >> +/*
> >> + * net/dsa/mv88e6060.h - Marvell 88e6060 switch chip support
> >> + * Copyright (c) 2008 Marvell Semiconductor
> > 
> > Also I don't think the copyright notice is correct here.
> > 
> > Thanks,
> > -v
> > 
> Vivien,
> 
> Is something like this OK ?

I'd say yes.

> /*
>  * drivers/net/dsa/mv88e6060.h - Marvell 88e6060 switch chip support
>  * Copyright (c) 2015 Neil Armstrong
>  *
>  * Based on mv88e6xxx.h
>  * Copyright (c) 2008 Marvell Semiconductor
>  *
>  * This program is free software; you can redistribute it and/or modify
>  * it under the terms of the GNU General Public License as published by
>  * the Free Software Foundation; either version 2 of the License, or
>  * (at your option) any later version.
>  */

Thanks Neil,
-v
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net v3] af-unix: fix use-after-free with concurrent readers while splicing

2015-11-10 Thread Hannes Frederic Sowa
During splicing an af-unix socket to a pipe we have to drop all
af-unix socket locks. While doing so we allow another reader to enter
unix_stream_read_generic which can read, copy and finally free another
skb. If exactly this skb is just in process of being spliced we get a
use-after-free report by kasan.

First, we must make sure to not have a free while the skb is used during
the splice operation. We simply increment its use counter before unlocking
the reader lock.

Stream sockets have the nice characteristic that we don't care about
zero length writes and they never reach the peer socket's queue. That
said, we can take the UNIXCB.consumed field as the indicator if the
skb was already freed from the socket's receive queue. If the skb was
fully consumed after we locked the reader side again we know it has been
dropped by a second reader. We indicate a short read to user space and
abort the current splice operation.

This bug has been found with syzkaller
(http://github.com/google/syzkaller) by Dmitry Vyukov.

Fixes: 2b514574f7e8 ("net: af_unix: implement splice for stream af_unix 
sockets")
Reported-by: Dmitry Vyukov 
Cc: Dmitry Vyukov 
Cc: Eric Dumazet 
Acked-by: Eric Dumazet 
Signed-off-by: Hannes Frederic Sowa 
---
v2: add missing consume_skb in error path of recv_actor
v3: move skb_get to separate line as proposed by Eric Dumazet (thanks!)

 net/unix/af_unix.c | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index aaa0b58..12b886f 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -441,6 +441,7 @@ static void unix_release_sock(struct sock *sk, int embrion)
if (state == TCP_LISTEN)
unix_release_sock(skb->sk, 1);
/* passed fds are erased in the kfree_skb hook*/
+   UNIXCB(skb).consumed = skb->len;
kfree_skb(skb);
}
 
@@ -2072,6 +2073,7 @@ static int unix_stream_read_generic(struct 
unix_stream_read_state *state)
 
do {
int chunk;
+   bool drop_skb;
struct sk_buff *skb, *last;
 
unix_state_lock(sk);
@@ -2152,7 +2154,11 @@ unlock:
}
 
chunk = min_t(unsigned int, unix_skb_len(skb) - skip, size);
+   skb_get(skb);
chunk = state->recv_actor(skb, skip, chunk, state);
+   drop_skb = !unix_skb_len(skb);
+   /* skb is only safe to use if !drop_skb */
+   consume_skb(skb);
if (chunk < 0) {
if (copied == 0)
copied = -EFAULT;
@@ -2161,6 +2167,18 @@ unlock:
copied += chunk;
size -= chunk;
 
+   if (drop_skb) {
+   /* the skb was touched by a concurrent reader;
+* we should not expect anything from this skb
+* anymore and assume it invalid - we can be
+* sure it was dropped from the socket queue
+*
+* let's report a short read
+*/
+   err = 0;
+   break;
+   }
+
/* Mark read part of skb as used */
if (!(flags & MSG_PEEK)) {
UNIXCB(skb).consumed += chunk;
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 5/6] net: dsa: mv88e6060: add register defines header file

2015-11-10 Thread Vivien Didelot
On Nov. Tuesday 10 (46) 04:51 PM, Neil Armstrong wrote:
> To align with the mv88e6xxx code, add a similar header file
> with all the register defines.
> The file is based on the mv88e6xxx header for coherency.
> 
> Acked-by: Andrew Lunn 
> Signed-off-by: Neil Armstrong 

Acked-by: Vivien Didelot 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/6] net: dsa: mv88e6060: use the correct InitReady bit

2015-11-10 Thread Vivien Didelot
On Nov. Tuesday 10 (46) 04:51 PM, Neil Armstrong wrote:
> According to the mv88e6060 datasheet, the InitReady bit position
> is 11 and the polarity is inverted.
> Use the bit correctly to detect the end of initialization.
> 
> Acked-by: Andrew Lunn 
> Signed-off-by: Neil Armstrong 

Acked-by: Vivien Didelot 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5] net: ethernet: add driver for Aurora VLSI NB8800 Ethernet controller

2015-11-10 Thread Mans Rullgard
This adds a driver for the Aurora VLSI NB8800 Ethernet controller.
It is an almost complete rewrite of a driver originally found in
a Sigma Designs 2.6.22 tree.

Signed-off-by: Mans Rullgard 
---
Changes:
- Refactored mdio access functions
- Refactored register access helpers
- Improved error handling in rx buffer allocation
- Optimised some fifo parameters
- Overhauled tx dma. Multiple packets are now chained in a single dma
  operation if xmit_more is set, improving performance.
- Improved rx irq handling. It's not possible to disable interrupts
  entirely for napi poll, but they can be slowed down a little.
- Use readx_poll_timeout in various places
- Improved error detection
- Improved statistics
- Report hardware statistics counters through ethtool
- Improved tangox-specific setup
- Support for flow control using pause frames
- Explanatory comments added
- Various minor stylistic changes
---
 drivers/net/ethernet/Kconfig |1 +
 drivers/net/ethernet/Makefile|1 +
 drivers/net/ethernet/aurora/Kconfig  |   20 +
 drivers/net/ethernet/aurora/Makefile |1 +
 drivers/net/ethernet/aurora/nb8800.c | 1530 ++
 drivers/net/ethernet/aurora/nb8800.h |  314 +++
 6 files changed, 1867 insertions(+)
 create mode 100644 drivers/net/ethernet/aurora/Kconfig
 create mode 100644 drivers/net/ethernet/aurora/Makefile
 create mode 100644 drivers/net/ethernet/aurora/nb8800.c
 create mode 100644 drivers/net/ethernet/aurora/nb8800.h

diff --git a/drivers/net/ethernet/Kconfig b/drivers/net/ethernet/Kconfig
index 05aa759..8310163 100644
--- a/drivers/net/ethernet/Kconfig
+++ b/drivers/net/ethernet/Kconfig
@@ -29,6 +29,7 @@ source "drivers/net/ethernet/apm/Kconfig"
 source "drivers/net/ethernet/apple/Kconfig"
 source "drivers/net/ethernet/arc/Kconfig"
 source "drivers/net/ethernet/atheros/Kconfig"
+source "drivers/net/ethernet/aurora/Kconfig"
 source "drivers/net/ethernet/cadence/Kconfig"
 source "drivers/net/ethernet/adi/Kconfig"
 source "drivers/net/ethernet/broadcom/Kconfig"
diff --git a/drivers/net/ethernet/Makefile b/drivers/net/ethernet/Makefile
index ddfc808..b435fb0 100644
--- a/drivers/net/ethernet/Makefile
+++ b/drivers/net/ethernet/Makefile
@@ -15,6 +15,7 @@ obj-$(CONFIG_NET_XGENE) += apm/
 obj-$(CONFIG_NET_VENDOR_APPLE) += apple/
 obj-$(CONFIG_NET_VENDOR_ARC) += arc/
 obj-$(CONFIG_NET_VENDOR_ATHEROS) += atheros/
+obj-$(CONFIG_NET_VENDOR_AURORA) += aurora/
 obj-$(CONFIG_NET_CADENCE) += cadence/
 obj-$(CONFIG_NET_BFIN) += adi/
 obj-$(CONFIG_NET_VENDOR_BROADCOM) += broadcom/
diff --git a/drivers/net/ethernet/aurora/Kconfig 
b/drivers/net/ethernet/aurora/Kconfig
new file mode 100644
index 000..a3c7106
--- /dev/null
+++ b/drivers/net/ethernet/aurora/Kconfig
@@ -0,0 +1,20 @@
+config NET_VENDOR_AURORA
+   bool "Aurora VLSI devices"
+   help
+ If you have a network (Ethernet) device belonging to this class,
+ say Y.
+
+ Note that the answer to this question doesn't directly affect the
+ kernel: saying N will just cause the configurator to skip all
+ questions about Aurora devices. If you say Y, you will be asked
+ for your specific device in the following questions.
+
+if NET_VENDOR_AURORA
+
+config AURORA_NB8800
+   tristate "Aurora AU-NB8800 support"
+   select PHYLIB
+   help
+Support for the AU-NB8800 gigabit Ethernet controller.
+
+endif
diff --git a/drivers/net/ethernet/aurora/Makefile 
b/drivers/net/ethernet/aurora/Makefile
new file mode 100644
index 000..6cb528a
--- /dev/null
+++ b/drivers/net/ethernet/aurora/Makefile
@@ -0,0 +1 @@
+obj-$(CONFIG_AURORA_NB8800) += nb8800.o
diff --git a/drivers/net/ethernet/aurora/nb8800.c 
b/drivers/net/ethernet/aurora/nb8800.c
new file mode 100644
index 000..11cd389
--- /dev/null
+++ b/drivers/net/ethernet/aurora/nb8800.c
@@ -0,0 +1,1530 @@
+/*
+ * Copyright (C) 2015 Mans Rullgard 
+ *
+ * Mostly rewritten, based on driver from Sigma Designs.  Original
+ * copyright notice below.
+ *
+ *
+ * Driver for tangox SMP864x/SMP865x/SMP867x/SMP868x builtin Ethernet Mac.
+ *
+ * Copyright (C) 2005 Maxime Bizon 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "nb8800.h"
+
+static int nb8800_dma_stop(struct net_device *dev);
+
+static inline u8 

Re: [PATCH] netfilter: Fix removal of GRE expectation entries created by PPTP

2015-11-10 Thread Pablo Neira Ayuso
On Tue, Nov 10, 2015 at 05:36:29PM +0100, Pablo Neira Ayuso wrote:
> From: Anthony Lineham 
> 
> The uninitialized tuple structure caused incorrect hash calculation
> and the lookup failed.

Please, ignore this.

This patch is already in the nf tree, it just slipped through in my
last git send-email.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: PING: [PATCH] net: smsc911x: Reset PHY during initialization

2015-11-10 Thread David Miller
From: Pavel Fedin 
Date: Tue, 10 Nov 2015 09:36:24 +0300

>  Hello! So, what should we do with this?

If you think I should reconsider the patch, you should resubmit it.
The ball is always in your court.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net v2] af-unix: fix use-after-free with concurrent readers while splicing

2015-11-10 Thread Eric Dumazet
On Tue, 2015-11-10 at 15:47 +0100, Hannes Frederic Sowa wrote:
> During splicing an af-unix socket to a pipe we have to drop all
> af-unix socket locks. While doing so we allow another reader to enter
> unix_stream_read_generic which can read, copy and finally free another
> skb. If exactly this skb is just in process of being spliced we get a
> use-after-free report by kasan.
> 
> First, we must make sure to not have a free while the skb is used during
> the splice operation. We simply increment its use counter before unlocking
> the reader lock.
> 
> Stream sockets have the nice characteristic that we don't care about
> zero length writes and they never reach the peer socket's queue. That
> said, we can take the UNIXCB.consumed field as the indicator if the
> skb was already freed from the socket's receive queue. If the skb was
> fully consumed after we locked the reader side again we know it has been
> dropped by a second reader. We indicate a short read to user space and
> abort the current splice operation.
> 
> This bug has been found with syzkaller
> (http://github.com/google/syzkaller) by Dmitry Vyukov.
> 
> Reported-by: Dmitry Vyukov 
> Cc: Dmitry Vyukov 
> Cc: Eric Dumazet 
> Signed-off-by: Hannes Frederic Sowa 

Please Hannes include the Fixes: tag.

As you might already know, patchwork does not catch it later

Fixes: 2b514574f7e8 ("net: af_unix: implement splice for stream af_unix 
sockets")
Acked-by: Eric Dumazet 

Also I would prefer skb_get() being on a separate line, to ease future
understanding of the code.

Thanks.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 3/6] net: dsa: mv88e6060: use the correct MaxFrameSize bit

2015-11-10 Thread Neil Armstrong
According to the mv88e6060 datasheet, the MaxFrameSize bit position
is 10 instead of 11 which is reserved.
Use the bit correctly to setup max frame size to 1536.

Acked-by: Andrew Lunn 
Signed-off-by: Neil Armstrong 
---
 drivers/net/dsa/mv88e6060.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/dsa/mv88e6060.c b/drivers/net/dsa/mv88e6060.c
index eff5e18..10647ad 100644
--- a/drivers/net/dsa/mv88e6060.c
+++ b/drivers/net/dsa/mv88e6060.c
@@ -119,7 +119,7 @@ static int mv88e6060_setup_global(struct dsa_switch *ds)
 * set the maximum frame size to 1536 bytes, and mask all
 * interrupt sources.
 */
-   REG_WRITE(REG_GLOBAL, 0x04, 0x0800);
+   REG_WRITE(REG_GLOBAL, 0x04, 0x400);

/* Enable automatic address learning, set the address
 * database size to 1024 entries, and set the default aging
-- 
1.9.1
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 1/6] net: dsa: mv88e6060: remove poll_link callback

2015-11-10 Thread Neil Armstrong
As of mv88e6xxx remove the poll_link callback since the link
state change polling is now handled by the phylib.

Tested on a mv88e6060 B0 device with a TI DM816X SoC.

Suggested-by: Andrew Lunn 
Acked-by: Andrew Lunn 
Signed-off-by: Neil Armstrong 
---
 drivers/net/dsa/mv88e6060.c | 49 -
 1 file changed, 49 deletions(-)

diff --git a/drivers/net/dsa/mv88e6060.c b/drivers/net/dsa/mv88e6060.c
index 9093577..6885ef5 100644
--- a/drivers/net/dsa/mv88e6060.c
+++ b/drivers/net/dsa/mv88e6060.c
@@ -225,54 +225,6 @@ mv88e6060_phy_write(struct dsa_switch *ds, int port, int 
regnum, u16 val)
return reg_write(ds, addr, regnum, val);
 }

-static void mv88e6060_poll_link(struct dsa_switch *ds)
-{
-   int i;
-
-   for (i = 0; i < DSA_MAX_PORTS; i++) {
-   struct net_device *dev;
-   int uninitialized_var(port_status);
-   int link;
-   int speed;
-   int duplex;
-   int fc;
-
-   dev = ds->ports[i];
-   if (dev == NULL)
-   continue;
-
-   link = 0;
-   if (dev->flags & IFF_UP) {
-   port_status = reg_read(ds, REG_PORT(i), 0x00);
-   if (port_status < 0)
-   continue;
-
-   link = !!(port_status & 0x1000);
-   }
-
-   if (!link) {
-   if (netif_carrier_ok(dev)) {
-   netdev_info(dev, "link down\n");
-   netif_carrier_off(dev);
-   }
-   continue;
-   }
-
-   speed = (port_status & 0x0100) ? 100 : 10;
-   duplex = (port_status & 0x0200) ? 1 : 0;
-   fc = ((port_status & 0xc000) == 0xc000) ? 1 : 0;
-
-   if (!netif_carrier_ok(dev)) {
-   netdev_info(dev,
-   "link up, %d Mb/s, %s duplex, flow control 
%sabled\n",
-   speed,
-   duplex ? "full" : "half",
-   fc ? "en" : "dis");
-   netif_carrier_on(dev);
-   }
-   }
-}
-
 static struct dsa_switch_driver mv88e6060_switch_driver = {
.tag_protocol   = DSA_TAG_PROTO_TRAILER,
.probe  = mv88e6060_probe,
@@ -280,7 +232,6 @@ static struct dsa_switch_driver mv88e6060_switch_driver = {
.set_addr   = mv88e6060_set_addr,
.phy_read   = mv88e6060_phy_read,
.phy_write  = mv88e6060_phy_write,
-   .poll_link  = mv88e6060_poll_link,
 };

 static int __init mv88e6060_init(void)
-- 
1.9.1
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 4/6] net: dsa: mv88e6060: use the correct bit shift for mac0

2015-11-10 Thread Neil Armstrong
According to the mv88e6060 datasheet, the first mac byte must
be at position 9 instead of 8 since the bit 8 is used to select
if the mac address must differ for each port for Pause frames.
Use the correct shift and set the same mac address for all port.

Acked-by: Andrew Lunn 
Signed-off-by: Neil Armstrong 
---
 drivers/net/dsa/mv88e6060.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/dsa/mv88e6060.c b/drivers/net/dsa/mv88e6060.c
index 10647ad..cd08079 100644
--- a/drivers/net/dsa/mv88e6060.c
+++ b/drivers/net/dsa/mv88e6060.c
@@ -188,7 +188,8 @@ static int mv88e6060_setup(struct dsa_switch *ds)

 static int mv88e6060_set_addr(struct dsa_switch *ds, u8 *addr)
 {
-   REG_WRITE(REG_GLOBAL, 0x01, (addr[0] << 8) | addr[1]);
+   /* Use the same MAC Address as FD Pause frames for all ports */
+   REG_WRITE(REG_GLOBAL, 0x01, (addr[0] << 9) | addr[1]);
REG_WRITE(REG_GLOBAL, 0x02, (addr[2] << 8) | addr[3]);
REG_WRITE(REG_GLOBAL, 0x03, (addr[4] << 8) | addr[5]);

-- 
1.9.1
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 5/6] net: dsa: mv88e6060: add register defines header file

2015-11-10 Thread Neil Armstrong
To align with the mv88e6xxx code, add a similar header file
with all the register defines.
The file is based on the mv88e6xxx header for coherency.

Acked-by: Andrew Lunn 
Signed-off-by: Neil Armstrong 
---
 drivers/net/dsa/mv88e6060.h | 111 
 1 file changed, 111 insertions(+)
 create mode 100644 drivers/net/dsa/mv88e6060.h

diff --git a/drivers/net/dsa/mv88e6060.h b/drivers/net/dsa/mv88e6060.h
new file mode 100644
index 000..46c92b6
--- /dev/null
+++ b/drivers/net/dsa/mv88e6060.h
@@ -0,0 +1,111 @@
+/*
+ * drivers/net/dsa/mv88e6060.h - Marvell 88e6060 switch chip support
+ * Copyright (c) 2015 Neil Armstrong
+ *
+ * Based on mv88e6xxx.h
+ * Copyright (c) 2008 Marvell Semiconductor
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#ifndef __MV88E6060_H
+#define __MV88E6060_H
+
+#define MV88E6060_PORTS6
+
+#define REG_PORT(p)(0x8 + (p))
+#define PORT_STATUS0x00
+#define PORT_STATUS_PAUSE_EN   BIT(15)
+#define PORT_STATUS_MY_PAUSE   BIT(14)
+#define PORT_STATUS_FC (PORT_STATUS_MY_PAUSE | PORT_STATUS_PAUSE_EN)
+#define PORT_STATUS_RESOLVED   BIT(13)
+#define PORT_STATUS_LINK   BIT(12)
+#define PORT_STATUS_PORTMODE   BIT(11)
+#define PORT_STATUS_PHYMODEBIT(10)
+#define PORT_STATUS_DUPLEX BIT(9)
+#define PORT_STATUS_SPEED  BIT(8)
+#define PORT_SWITCH_ID 0x03
+#define PORT_SWITCH_ID_60600x0600
+#define PORT_SWITCH_ID_6060_MASK   0xfff0
+#define PORT_SWITCH_ID_6060_R1 0x0601
+#define PORT_SWITCH_ID_6060_R2 0x0602
+#define PORT_CONTROL   0x04
+#define PORT_CONTROL_FORCE_FLOW_CTRL   BIT(15)
+#define PORT_CONTROL_TRAILER   BIT(14)
+#define PORT_CONTROL_HEADERBIT(11)
+#define PORT_CONTROL_INGRESS_MODE  BIT(8)
+#define PORT_CONTROL_VLAN_TUNNEL   BIT(7)
+#define PORT_CONTROL_STATE_MASK0x03
+#define PORT_CONTROL_STATE_DISABLED0x00
+#define PORT_CONTROL_STATE_BLOCKING0x01
+#define PORT_CONTROL_STATE_LEARNING0x02
+#define PORT_CONTROL_STATE_FORWARDING  0x03
+#define PORT_VLAN_MAP  0x06
+#define PORT_VLAN_MAP_DBNUM_SHIFT  12
+#define PORT_VLAN_MAP_TABLE_MASK   0x1f
+#define PORT_ASSOC_VECTOR  0x0b
+#define PORT_ASSOC_VECTOR_MONITOR  BIT(15)
+#define PORT_ASSOC_VECTOR_PAV_MASK 0x1f
+#define PORT_RX_CNTR   0x10
+#define PORT_TX_CNTR   0x11
+
+#define REG_GLOBAL 0x0f
+#define GLOBAL_STATUS  0x00
+#define GLOBAL_STATUS_SW_MODE_MASK (0x3 << 12)
+#define GLOBAL_STATUS_SW_MODE_0(0x0 << 12)
+#define GLOBAL_STATUS_SW_MODE_1(0x1 << 12)
+#define GLOBAL_STATUS_SW_MODE_2(0x2 << 12)
+#define GLOBAL_STATUS_SW_MODE_3(0x3 << 12)
+#define GLOBAL_STATUS_INIT_READY   BIT(11)
+#define GLOBAL_STATUS_ATU_FULL BIT(3)
+#define GLOBAL_STATUS_ATU_DONE BIT(2)
+#define GLOBAL_STATUS_PHY_INT  BIT(1)
+#define GLOBAL_STATUS_EEINTBIT(0)
+#define GLOBAL_MAC_01  0x01
+#define GLOBAL_MAC_01_DIFF_ADDRBIT(8)
+#define GLOBAL_MAC_23  0x02
+#define GLOBAL_MAC_45  0x03
+#define GLOBAL_CONTROL 0x04
+#define GLOBAL_CONTROL_DISCARD_EXCESS  BIT(13)
+#define GLOBAL_CONTROL_MAX_FRAME_1536  BIT(10)
+#define GLOBAL_CONTROL_RELOAD_EEPROM   BIT(9)
+#define GLOBAL_CONTROL_CTRMODE BIT(8)
+#define GLOBAL_CONTROL_ATU_FULL_EN BIT(3)
+#define GLOBAL_CONTROL_ATU_DONE_EN BIT(2)
+#define GLOBAL_CONTROL_PHYINT_EN   BIT(1)
+#define GLOBAL_CONTROL_EEPROM_DONE_EN  BIT(0)
+#define GLOBAL_ATU_CONTROL 0x0a
+#define GLOBAL_ATU_CONTROL_SWRESET BIT(15)
+#define GLOBAL_ATU_CONTROL_LEARNDISBIT(14)
+#define GLOBAL_ATU_CONTROL_ATUSIZE_256 (0x0 << 12)
+#define GLOBAL_ATU_CONTROL_ATUSIZE_512 (0x1 << 12)
+#define GLOBAL_ATU_CONTROL_ATUSIZE_1024(0x2 << 12)
+#define GLOBAL_ATU_CONTROL_ATE_AGE_SHIFT   4
+#define GLOBAL_ATU_CONTROL_ATE_AGE_MASK(0xff << 4)
+#define GLOBAL_ATU_CONTROL_ATE_AGE_5MIN(0x13 << 4)
+#define GLOBAL_ATU_OP  0x0b
+#define GLOBAL_ATU_OP_BUSY BIT(15)
+#define GLOBAL_ATU_OP_NOP  (0 << 12)
+#define GLOBAL_ATU_OP_FLUSH_ALL((1 << 12) | GLOBAL_ATU_OP_BUSY)
+#define GLOBAL_ATU_OP_FLUSH_UNLOCKED   ((2 << 12) | GLOBAL_ATU_OP_BUSY)
+#define GLOBAL_ATU_OP_LOAD_DB  ((3 << 12) | GLOBAL_ATU_OP_BUSY)
+#define GLOBAL_ATU_OP_GET_NEXT_DB  ((4 << 12) | GLOBAL_ATU_OP_BUSY)
+#define GLOBAL_ATU_OP_FLUSH_DB ((5 << 12) | GLOBAL_ATU_OP_BUSY)
+#define GLOBAL_ATU_OP_FLUSH_UNLOCKED_DB ((6 << 12) | GLOBAL_ATU_OP_BUSY)
+#define GLOBAL_ATU_DATA0x0c
+#define GLOBAL_ATU_DATA_PORT_VECTOR_MASK   0x3f0
+#define GLOBAL_ATU_DATA_PORT_VECTOR_SHIFT  4
+#define GLOBAL_ATU_DATA_STATE_MASK 0x0f
+#define 

[PATCH v2 6/6] net: dsa: mv88e6060: replace magic values with register defines

2015-11-10 Thread Neil Armstrong
To align with the mv88e6xxx code, use the register defines to
access all the register addresses and bit fields.

Acked-by: Andrew Lunn 
Signed-off-by: Neil Armstrong 
---
 drivers/net/dsa/mv88e6060.c | 64 ++---
 1 file changed, 37 insertions(+), 27 deletions(-)

diff --git a/drivers/net/dsa/mv88e6060.c b/drivers/net/dsa/mv88e6060.c
index cd08079..0527f48 100644
--- a/drivers/net/dsa/mv88e6060.c
+++ b/drivers/net/dsa/mv88e6060.c
@@ -15,9 +15,7 @@
 #include 
 #include 
 #include 
-
-#define REG_PORT(p)(8 + (p))
-#define REG_GLOBAL 0x0f
+#include "mv88e6060.h"

 static int reg_read(struct dsa_switch *ds, int addr, int reg)
 {
@@ -67,13 +65,14 @@ static char *mv88e6060_probe(struct device *host_dev, int 
sw_addr)
if (bus == NULL)
return NULL;

-   ret = mdiobus_read(bus, sw_addr + REG_PORT(0), 0x03);
+   ret = mdiobus_read(bus, sw_addr + REG_PORT(0), PORT_SWITCH_ID);
if (ret >= 0) {
-   if (ret == 0x0600)
+   if (ret == PORT_SWITCH_ID_6060)
return "Marvell 88E6060 (A0)";
-   if (ret == 0x0601 || ret == 0x0602)
+   if (ret == PORT_SWITCH_ID_6060_R1 ||
+   ret == PORT_SWITCH_ID_6060_R2)
return "Marvell 88E6060 (B0)";
-   if ((ret & 0xfff0) == 0x0600)
+   if ((ret & PORT_SWITCH_ID_6060_MASK) == PORT_SWITCH_ID_6060)
return "Marvell 88E6060";
}

@@ -87,22 +86,26 @@ static int mv88e6060_switch_reset(struct dsa_switch *ds)
unsigned long timeout;

/* Set all ports to the disabled state. */
-   for (i = 0; i < 6; i++) {
-   ret = REG_READ(REG_PORT(i), 0x04);
-   REG_WRITE(REG_PORT(i), 0x04, ret & 0xfffc);
+   for (i = 0; i < MV88E6060_PORTS; i++) {
+   ret = REG_READ(REG_PORT(i), PORT_CONTROL);
+   REG_WRITE(REG_PORT(i), PORT_CONTROL,
+ ret & ~PORT_CONTROL_STATE_MASK);
}

/* Wait for transmit queues to drain. */
usleep_range(2000, 4000);

/* Reset the switch. */
-   REG_WRITE(REG_GLOBAL, 0x0a, 0xa130);
+   REG_WRITE(REG_GLOBAL, GLOBAL_ATU_CONTROL,
+ GLOBAL_ATU_CONTROL_SWRESET |
+ GLOBAL_ATU_CONTROL_ATUSIZE_1024 |
+ GLOBAL_ATU_CONTROL_ATE_AGE_5MIN);

/* Wait up to one second for reset to complete. */
timeout = jiffies + 1 * HZ;
while (time_before(jiffies, timeout)) {
-   ret = REG_READ(REG_GLOBAL, 0x00);
-   if (ret & 0x800)
+   ret = REG_READ(REG_GLOBAL, GLOBAL_STATUS);
+   if (ret & GLOBAL_STATUS_INIT_READY)
break;

usleep_range(1000, 2000);
@@ -119,13 +122,15 @@ static int mv88e6060_setup_global(struct dsa_switch *ds)
 * set the maximum frame size to 1536 bytes, and mask all
 * interrupt sources.
 */
-   REG_WRITE(REG_GLOBAL, 0x04, 0x400);
+   REG_WRITE(REG_GLOBAL, GLOBAL_CONTROL, GLOBAL_CONTROL_MAX_FRAME_1536);

/* Enable automatic address learning, set the address
 * database size to 1024 entries, and set the default aging
 * time to 5 minutes.
 */
-   REG_WRITE(REG_GLOBAL, 0x0a, 0x2130);
+   REG_WRITE(REG_GLOBAL, GLOBAL_ATU_CONTROL,
+ GLOBAL_ATU_CONTROL_ATUSIZE_1024 |
+ GLOBAL_ATU_CONTROL_ATE_AGE_5MIN);

return 0;
 }
@@ -139,25 +144,30 @@ static int mv88e6060_setup_port(struct dsa_switch *ds, 
int p)
 * state to Forwarding.  Additionally, if this is the CPU
 * port, enable Ingress and Egress Trailer tagging mode.
 */
-   REG_WRITE(addr, 0x04, dsa_is_cpu_port(ds, p) ?  0x4103 : 0x0003);
+   REG_WRITE(addr, PORT_CONTROL,
+ dsa_is_cpu_port(ds, p) ?
+   PORT_CONTROL_TRAILER |
+   PORT_CONTROL_INGRESS_MODE |
+   PORT_CONTROL_STATE_FORWARDING :
+   PORT_CONTROL_STATE_FORWARDING);

/* Port based VLAN map: give each port its own address
 * database, allow the CPU port to talk to each of the 'real'
 * ports, and allow each of the 'real' ports to only talk to
 * the CPU port.
 */
-   REG_WRITE(addr, 0x06,
-   ((p & 0xf) << 12) |
-(dsa_is_cpu_port(ds, p) ?
-   ds->phys_port_mask :
-   (1 << ds->dst->cpu_port)));
+   REG_WRITE(addr, PORT_VLAN_MAP,
+ ((p & 0xf) << PORT_VLAN_MAP_DBNUM_SHIFT) |
+  (dsa_is_cpu_port(ds, p) ?
+   ds->phys_port_mask :
+   BIT(ds->dst->cpu_port)));

/* Port Association Vector: when learning source addresses
 * of packets, add the 

[PATCH nf 2/2] netfilter: nf_tables: add clone interface to expression operations

2015-11-10 Thread Pablo Neira Ayuso
With the conversion of the counter expressions to make it percpu, we
need to clone the percpu memory area, otherwise we crash when using
counters from flow tables.

Signed-off-by: Pablo Neira Ayuso 
---
 include/net/netfilter/nf_tables.h | 16 +++--
 net/netfilter/nft_counter.c   | 49 ---
 net/netfilter/nft_dynset.c|  5 ++--
 3 files changed, 58 insertions(+), 12 deletions(-)

diff --git a/include/net/netfilter/nf_tables.h 
b/include/net/netfilter/nf_tables.h
index c9149cc..c186457 100644
--- a/include/net/netfilter/nf_tables.h
+++ b/include/net/netfilter/nf_tables.h
@@ -630,6 +630,8 @@ struct nft_expr_ops {
int (*validate)(const struct nft_ctx *ctx,
const struct nft_expr *expr,
const struct nft_data 
**data);
+   int (*clone)(struct nft_expr *dst,
+const struct nft_expr *src);
const struct nft_expr_type  *type;
void*data;
 };
@@ -660,10 +662,20 @@ void nft_expr_destroy(const struct nft_ctx *ctx, struct 
nft_expr *expr);
 int nft_expr_dump(struct sk_buff *skb, unsigned int attr,
  const struct nft_expr *expr);
 
-static inline void nft_expr_clone(struct nft_expr *dst, struct nft_expr *src)
+static inline int nft_expr_clone(struct nft_expr *dst, struct nft_expr *src)
 {
+   int err;
+
__module_get(src->ops->type->owner);
-   memcpy(dst, src, src->ops->size);
+   if (src->ops->clone) {
+   memcpy(dst, src, sizeof(*src));
+   err = src->ops->clone(dst, src);
+   if (err < 0)
+   return err;
+   } else {
+   memcpy(dst, src, src->ops->size);
+   }
+   return 0;
 }
 
 /**
diff --git a/net/netfilter/nft_counter.c b/net/netfilter/nft_counter.c
index 1067fb4..c7808fc 100644
--- a/net/netfilter/nft_counter.c
+++ b/net/netfilter/nft_counter.c
@@ -47,27 +47,34 @@ static void nft_counter_eval(const struct nft_expr *expr,
local_bh_enable();
 }
 
-static int nft_counter_dump(struct sk_buff *skb, const struct nft_expr *expr)
+static void nft_counter_fetch(const struct nft_counter_percpu __percpu 
*counter,
+ struct nft_counter *total)
 {
-   struct nft_counter_percpu_priv *priv = nft_expr_priv(expr);
-   struct nft_counter_percpu *cpu_stats;
-   struct nft_counter total;
+   const struct nft_counter_percpu *cpu_stats;
u64 bytes, packets;
unsigned int seq;
int cpu;
 
-   memset(, 0, sizeof(total));
+   memset(total, 0, sizeof(*total));
for_each_possible_cpu(cpu) {
-   cpu_stats = per_cpu_ptr(priv->counter, cpu);
+   cpu_stats = per_cpu_ptr(counter, cpu);
do {
seq = u64_stats_fetch_begin_irq(_stats->syncp);
bytes   = cpu_stats->counter.bytes;
packets = cpu_stats->counter.packets;
} while (u64_stats_fetch_retry_irq(_stats->syncp, seq));
 
-   total.packets += packets;
-   total.bytes += bytes;
+   total->packets += packets;
+   total->bytes += bytes;
}
+}
+
+static int nft_counter_dump(struct sk_buff *skb, const struct nft_expr *expr)
+{
+   struct nft_counter_percpu_priv *priv = nft_expr_priv(expr);
+   struct nft_counter total;
+
+   nft_counter_fetch(priv->counter, );
 
if (nla_put_be64(skb, NFTA_COUNTER_BYTES, cpu_to_be64(total.bytes)) ||
nla_put_be64(skb, NFTA_COUNTER_PACKETS, cpu_to_be64(total.packets)))
@@ -118,6 +125,31 @@ static void nft_counter_destroy(const struct nft_ctx *ctx,
free_percpu(priv->counter);
 }
 
+static int nft_counter_clone(struct nft_expr *dst, const struct nft_expr *src)
+{
+   struct nft_counter_percpu_priv *priv = nft_expr_priv(src);
+   struct nft_counter_percpu_priv *priv_clone = nft_expr_priv(dst);
+   struct nft_counter_percpu __percpu *cpu_stats;
+   struct nft_counter_percpu *this_cpu;
+   struct nft_counter total;
+
+   nft_counter_fetch(priv->counter, );
+
+   cpu_stats = __netdev_alloc_pcpu_stats(struct nft_counter_percpu,
+ GFP_ATOMIC);
+   if (cpu_stats == NULL)
+   return ENOMEM;
+
+   preempt_disable();
+   this_cpu = this_cpu_ptr(cpu_stats);
+   this_cpu->counter.packets = total.packets;
+   this_cpu->counter.bytes = total.bytes;
+   preempt_enable();
+
+   priv_clone->counter = cpu_stats;
+   return 0;
+}
+
 static struct nft_expr_type nft_counter_type;
 static const struct nft_expr_ops nft_counter_ops = {
.type   = _counter_type,
@@ -126,6 +158,7 @@ static const struct nft_expr_ops 

[PATCH] netfilter: Fix removal of GRE expectation entries created by PPTP

2015-11-10 Thread Pablo Neira Ayuso
From: Anthony Lineham 

The uninitialized tuple structure caused incorrect hash calculation
and the lookup failed.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=106441
Signed-off-by: Anthony Lineham 
---
Original patch posted on kernel bugzilla.

 net/ipv4/netfilter/nf_nat_pptp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv4/netfilter/nf_nat_pptp.c b/net/ipv4/netfilter/nf_nat_pptp.c
index 657d230..b3ca21b 100644
--- a/net/ipv4/netfilter/nf_nat_pptp.c
+++ b/net/ipv4/netfilter/nf_nat_pptp.c
@@ -45,7 +45,7 @@ static void pptp_nat_expected(struct nf_conn *ct,
struct net *net = nf_ct_net(ct);
const struct nf_conn *master = ct->master;
struct nf_conntrack_expect *other_exp;
-   struct nf_conntrack_tuple t;
+   struct nf_conntrack_tuple t = {};
const struct nf_ct_pptp_master *ct_pptp_info;
const struct nf_nat_pptp *nat_pptp_info;
struct nf_nat_range range;
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH nf 1/2] net: add __netdev_alloc_pcpu_stats() to indicate gfp flags

2015-11-10 Thread Pablo Neira Ayuso
nf_tables may create percpu counters from the packet path through its
dynamic set instantiation infrastructure, so we need a way to allocate
this through GFP_ATOMIC.

Signed-off-by: Pablo Neira Ayuso 
---
 include/linux/netdevice.h | 27 +++
 1 file changed, 15 insertions(+), 12 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 2c00772..e9d0c8a 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2068,20 +2068,23 @@ struct pcpu_sw_netstats {
struct u64_stats_sync   syncp;
 };
 
-#define netdev_alloc_pcpu_stats(type)  \
-({ \
-   typeof(type) __percpu *pcpu_stats = alloc_percpu(type); \
-   if (pcpu_stats) {   \
-   int __cpu;  \
-   for_each_possible_cpu(__cpu) {  \
-   typeof(type) *stat; \
-   stat = per_cpu_ptr(pcpu_stats, __cpu);  \
-   u64_stats_init(>syncp);   \
-   }   \
-   }   \
-   pcpu_stats; \
+#define __netdev_alloc_pcpu_stats(type, gfp)   \
+({ \
+   typeof(type) __percpu *pcpu_stats = alloc_percpu_gfp(type, gfp);\
+   if (pcpu_stats) {   \
+   int __cpu;  \
+   for_each_possible_cpu(__cpu) {  \
+   typeof(type) *stat; \
+   stat = per_cpu_ptr(pcpu_stats, __cpu);  \
+   u64_stats_init(>syncp);   \
+   }   \
+   }   \
+   pcpu_stats; \
 })
 
+#define netdev_alloc_pcpu_stats(type)  \
+   __netdev_alloc_pcpu_stats(type, GFP_KERNEL);
+
 #include 
 
 /* netdevice notifier chain. Please remember to update the rtnetlink
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net v2] af-unix: fix use-after-free with concurrent readers while splicing

2015-11-10 Thread Hannes Frederic Sowa
During splicing an af-unix socket to a pipe we have to drop all
af-unix socket locks. While doing so we allow another reader to enter
unix_stream_read_generic which can read, copy and finally free another
skb. If exactly this skb is just in process of being spliced we get a
use-after-free report by kasan.

First, we must make sure to not have a free while the skb is used during
the splice operation. We simply increment its use counter before unlocking
the reader lock.

Stream sockets have the nice characteristic that we don't care about
zero length writes and they never reach the peer socket's queue. That
said, we can take the UNIXCB.consumed field as the indicator if the
skb was already freed from the socket's receive queue. If the skb was
fully consumed after we locked the reader side again we know it has been
dropped by a second reader. We indicate a short read to user space and
abort the current splice operation.

This bug has been found with syzkaller
(http://github.com/google/syzkaller) by Dmitry Vyukov.

Reported-by: Dmitry Vyukov 
Cc: Dmitry Vyukov 
Cc: Eric Dumazet 
Signed-off-by: Hannes Frederic Sowa 
---
 net/unix/af_unix.c | 19 ++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index aaa0b58..7770124 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -441,6 +441,7 @@ static void unix_release_sock(struct sock *sk, int embrion)
if (state == TCP_LISTEN)
unix_release_sock(skb->sk, 1);
/* passed fds are erased in the kfree_skb hook*/
+   UNIXCB(skb).consumed = skb->len;
kfree_skb(skb);
}
 
@@ -2072,6 +2073,7 @@ static int unix_stream_read_generic(struct 
unix_stream_read_state *state)
 
do {
int chunk;
+   bool drop_skb;
struct sk_buff *skb, *last;
 
unix_state_lock(sk);
@@ -2152,7 +2154,10 @@ unlock:
}
 
chunk = min_t(unsigned int, unix_skb_len(skb) - skip, size);
-   chunk = state->recv_actor(skb, skip, chunk, state);
+   chunk = state->recv_actor(skb_get(skb), skip, chunk, state);
+   drop_skb = !unix_skb_len(skb);
+   /* skb is only safe to use if !drop_skb */
+   consume_skb(skb);
if (chunk < 0) {
if (copied == 0)
copied = -EFAULT;
@@ -2161,6 +2166,18 @@ unlock:
copied += chunk;
size -= chunk;
 
+   if (drop_skb) {
+   /* the skb was touched by a concurrent reader;
+* we should not expect anything from this skb
+* anymore and assume it invalid - we can be
+* sure it was dropped from the socket queue
+*
+* let's report a short read
+*/
+   err = 0;
+   break;
+   }
+
/* Mark read part of skb as used */
if (!(flags & MSG_PEEK)) {
UNIXCB(skb).consumed += chunk;
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH, REPORT] bpf_trace: build error without PERF_EVENTS

2015-11-10 Thread Alexei Starovoitov
On Tue, Nov 10, 2015 at 09:25:01AM -0500, Steven Rostedt wrote:
> On Tue, 10 Nov 2015 14:31:38 +0100
> Daniel Borkmann  wrote:
> 
> > On 11/10/2015 01:55 PM, Arnd Bergmann wrote:
> > > In my ARM randconfig tests, I'm getting a build error for
> > > newly added code in bpf_perf_event_read and bpf_perf_event_output
> > > whenever CONFIG_PERF_EVENTS is disabled:
> > >
> > > kernel/trace/bpf_trace.c: In function 'bpf_perf_event_read':
> > > kernel/trace/bpf_trace.c:203:11: error: 'struct perf_event' has no member 
> > > named 'oncpu'
> > > if (event->oncpu != smp_processor_id() ||
> > >   ^
> > > kernel/trace/bpf_trace.c:204:11: error: 'struct perf_event' has no member 
> > > named 'pmu'
> > >event->pmu->count)
> > >
> > > This can happen when UPROBE_EVENT is enabled but KPROBE_EVENT
> > > is disabled. I'm not sure if that is a configuration we care
> > > about, otherwise we could prevent this case from occuring by
> > > adding Kconfig dependencies.  
> > 
> > I think that seems better than spreading #if IS_ENABLEDs into the code.
> > Probably enough to add a 'depends on PERF_EVENTS' to config BPF_EVENTS,
> > so it's also explicitly documented.
> > 
> 
> So just do the following then?
> 
> -- Steve
> 
> diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
> index 8d6363f42169..f5aecff2d243 100644
> --- a/kernel/trace/Kconfig
> +++ b/kernel/trace/Kconfig
> @@ -434,7 +434,7 @@ config UPROBE_EVENT
>  
>  config BPF_EVENTS
>   depends on BPF_SYSCALL
> - depends on KPROBE_EVENT || UPROBE_EVENT
> + depends on KPROBE_EVENT && UPROBE_EVENT

yeah that's definitely cleaner and avoids ifdef creep in the future.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH -stable] virtio-net: drop NETIF_F_FRAGLIST

2015-11-10 Thread Charles (Chas) Williams
Dave, could you please add

commit 48900cb6af4282fa0fb6ff4d72a81aa3dadb5c39
virtio-net: drop NETIF_F_FRAGLIST

to your stable queues for 3.14.y and 4.1.y?

This fixes CVE-2015-5156,
https://security-tracker.debian.org/tracker/CVE-2015-5156




--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH, REPORT] bpf_trace: build error without PERF_EVENTS

2015-11-10 Thread Daniel Borkmann

On 11/10/2015 06:14 PM, Alexei Starovoitov wrote:

On Tue, Nov 10, 2015 at 09:25:01AM -0500, Steven Rostedt wrote:

On Tue, 10 Nov 2015 14:31:38 +0100
Daniel Borkmann  wrote:


On 11/10/2015 01:55 PM, Arnd Bergmann wrote:

In my ARM randconfig tests, I'm getting a build error for
newly added code in bpf_perf_event_read and bpf_perf_event_output
whenever CONFIG_PERF_EVENTS is disabled:

kernel/trace/bpf_trace.c: In function 'bpf_perf_event_read':
kernel/trace/bpf_trace.c:203:11: error: 'struct perf_event' has no member named 
'oncpu'
if (event->oncpu != smp_processor_id() ||
   ^
kernel/trace/bpf_trace.c:204:11: error: 'struct perf_event' has no member named 
'pmu'
event->pmu->count)

This can happen when UPROBE_EVENT is enabled but KPROBE_EVENT
is disabled. I'm not sure if that is a configuration we care
about, otherwise we could prevent this case from occuring by
adding Kconfig dependencies.


I think that seems better than spreading #if IS_ENABLEDs into the code.
Probably enough to add a 'depends on PERF_EVENTS' to config BPF_EVENTS,
so it's also explicitly documented.



So just do the following then?

-- Steve

diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index 8d6363f42169..f5aecff2d243 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -434,7 +434,7 @@ config UPROBE_EVENT

  config BPF_EVENTS
depends on BPF_SYSCALL
-   depends on KPROBE_EVENT || UPROBE_EVENT
+   depends on KPROBE_EVENT && UPROBE_EVENT


yeah that's definitely cleaner and avoids ifdef creep in the future.


Agreed, that's better.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5] net: ethernet: add driver for Aurora VLSI NB8800 Ethernet controller

2015-11-10 Thread Måns Rullgård
Eric Dumazet  writes:

> On Tue, 2015-11-10 at 16:14 +, Mans Rullgard wrote:
>> This adds a driver for the Aurora VLSI NB8800 Ethernet controller.
>> It is an almost complete rewrite of a driver originally found in
>> a Sigma Designs 2.6.22 tree.
>
> ...
>
>> +
>> +static int nb8800_xmit(struct sk_buff *skb, struct net_device *dev)
>> +{
>> +struct nb8800_priv *priv = netdev_priv(dev);
>> +struct nb8800_tx_desc *txd;
>> +struct nb8800_tx_buf *txb;
>> +struct nb8800_dma_desc *dma;
>> +dma_addr_t dma_addr;
>> +unsigned int dma_len;
>> +unsigned long flags;
>> +int align;
>> +int next;
>> +
>> +if (atomic_read(>tx_free) <= NB8800_DESC_LOW) {
>> +netif_stop_queue(dev);
>> +return NETDEV_TX_BUSY;
>> +}
>> +
>> +align = (8 - (uintptr_t)skb->data) & 7;
>> +
>> +dma_len = skb->len - align;
>> +dma_addr = dma_map_single(>dev, skb->data + align,
>> +  dma_len, DMA_TO_DEVICE);
>> +
>> +if (dma_mapping_error(>dev, dma_addr)) {
>> +netdev_err(dev, "tx dma mapping error\n");
>> +kfree_skb(skb);
>> +dev->stats.tx_dropped++;
>> +return NETDEV_TX_OK;
>> +}
>> +
>> +if (atomic_dec_return(>tx_free) <= NB8800_DESC_LOW)
>> +netif_stop_queue(dev);
>
> You probably also want to clear skb->xmit_more or risk a stall

Very true, will fix.

-- 
Måns Rullgård
m...@mansr.com
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5] net: ethernet: add driver for Aurora VLSI NB8800 Ethernet controller

2015-11-10 Thread Mason
On 10/11/2015 17:14, Mans Rullgard wrote:

> This adds a driver for the Aurora VLSI NB8800 Ethernet controller.
> It is an almost complete rewrite of a driver originally found in
> a Sigma Designs 2.6.22 tree.
> 
> Signed-off-by: Mans Rullgard 
> ---
> Changes:
> - Refactored mdio access functions
> - Refactored register access helpers
> - Improved error handling in rx buffer allocation
> - Optimised some fifo parameters
> - Overhauled tx dma. Multiple packets are now chained in a single dma
>   operation if xmit_more is set, improving performance.
> - Improved rx irq handling. It's not possible to disable interrupts
>   entirely for napi poll, but they can be slowed down a little.
> - Use readx_poll_timeout in various places
> - Improved error detection
> - Improved statistics
> - Report hardware statistics counters through ethtool
> - Improved tangox-specific setup
> - Support for flow control using pause frames
> - Explanatory comments added
> - Various minor stylistic changes
> ---
>  drivers/net/ethernet/Kconfig |1 +
>  drivers/net/ethernet/Makefile|1 +
>  drivers/net/ethernet/aurora/Kconfig  |   20 +
>  drivers/net/ethernet/aurora/Makefile |1 +
>  drivers/net/ethernet/aurora/nb8800.c | 1530 
> ++
>  drivers/net/ethernet/aurora/nb8800.h |  314 +++
>  6 files changed, 1867 insertions(+)

The code has grown much since the previous patch, despite some
refactoring. Is this mostly due to ethtool_ops support?

 drivers/net/ethernet/aurora/nb8800.c | 1146 ++
 drivers/net/ethernet/aurora/nb8800.h |  230 +++

Regards.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] unix: avoid use-after-free in ep_remove_wait_queue

2015-11-10 Thread Rainer Weikusat
Jason Baron  writes:
> On 11/09/2015 09:40 AM, Rainer Weikusat wrote:

[...]

>> -if (unix_peer(other) != sk && unix_recvq_full(other)) {
>> +if (!unix_dgram_peer_recv_ready(sk, other)) {
>>  if (!timeo) {
>> -err = -EAGAIN;
>> -goto out_unlock;
>> +if (unix_dgram_peer_wake_me(sk, other)) {
>> +err = -EAGAIN;
>> +goto out_unlock;
>> +}
>> +
>> +goto restart;
>>  }
>
>
> So this will cause 'unix_state_lock(other) to be called twice in a
> row if we 'goto restart' (and hence will softlock the box). It just
> needs a 'unix_state_unlock(other);' before the 'goto restart'.

The goto restart was nonsense to begin with in this code path:
Restarting something is necessary after sleeping for some time but for
the case above, execution just continues. I've changed that (updated
patch should follow 'soon') to

if (!unix_dgram_peer_recv_ready(sk, other)) {
if (timeo) {
timeo = unix_wait_for_peer(other, timeo);

err = sock_intr_errno(timeo);
if (signal_pending(current))
goto out_free;

goto restart;
}

if (unix_dgram_peer_wake_me(sk, other)) {
err = -EAGAIN;
goto out_unlock;
}
}

> I also tested this patch with a single unix server and 200 client
> threads doing roughly epoll() followed by write() until -EAGAIN in a
> loop. The throughput for the test was roughly the same as current
> upstream, but the cpu usage was a lot higher. I think its b/c this patch
> takes the server wait queue lock in the _poll() routine. This causes a
> lot of contention. The previous patch you posted for this where you did
> not clear the wait queue on every wakeup and thus didn't need the queue
> lock in poll() (unless we were adding to it), performed much better.

I'm somewhat unsure what to make of that: The previous patch would also
take the wait queue lock whenever poll was about to return 'not
writable' because of the length of the server receive queue unless
another thread using the same client socket also noticed this and
enqueued this same socket already. And "hundreds of clients using a
single client socket in order to send data to a single server socket"
doesn't seem very realistic to me.

Also, this code shouldn't usually be executed as the server should
usually be capable of keeping up with the data sent by clients. If it's
permanently incapable of that, you're effectively performing a
(successful) DDOS against it. Which should result in "high CPU
utilization" in either case. It may be possible to improve this by
tuning/ changing the flow control mechanism. Out of my head, I'd suggest
making the queue longer (the default value is 10) and delaying wake ups
until the server actually did catch up, IOW, the receive queue is empty
or almost empty. But this ought to be done with a different patch.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5] net: ethernet: add driver for Aurora VLSI NB8800 Ethernet controller

2015-11-10 Thread Måns Rullgård
Eric Dumazet  writes:

> On Tue, 2015-11-10 at 16:14 +, Mans Rullgard wrote:
>> This adds a driver for the Aurora VLSI NB8800 Ethernet controller.
>> It is an almost complete rewrite of a driver originally found in
>> a Sigma Designs 2.6.22 tree.
>
> ...
>
>> +
>> +static void nb8800_receive(struct net_device *dev, int i, int len)
>> +{
>> +struct nb8800_priv *priv = netdev_priv(dev);
>> +struct nb8800_rx_desc *rxd = >rx_descs[i];
>> +struct page *page = priv->rx_bufs[i].page;
>> +int offset = priv->rx_bufs[i].offset;
>> +void *data = page_address(page) + offset;
>> +dma_addr_t dma = rxd->desc.s_addr;
>> +struct sk_buff *skb;
>> +int size;
>> +int err;
>> +
>> +size = len <= RX_COPYBREAK ? len : RX_COPYHDR;
>> +
>> +skb = napi_alloc_skb(>napi, size);
>> +if (!skb) {
>> +netdev_err(dev, "rx skb allocation failed\n");
>> +dev->stats.rx_dropped++;
>> +return;
>> +}
>> +
>> +if (len <= RX_COPYBREAK) {
>> +dma_sync_single_for_cpu(>dev, dma, len, DMA_FROM_DEVICE);
>> +memcpy(skb_put(skb, len), data, len);
>> +dma_sync_single_for_device(>dev, dma, len,
>> +   DMA_FROM_DEVICE);
>> +} else {
>> +err = nb8800_alloc_rx(dev, i, true);
>> +if (err) {
>> +netdev_err(dev, "rx buffer allocation failed\n");
>> +dev->stats.rx_dropped++;
>> +return;
>> +}
>> +
>> +dma_unmap_page(>dev, dma, RX_BUF_SIZE, DMA_FROM_DEVICE);
>> +memcpy(skb_put(skb, RX_COPYHDR), data, RX_COPYHDR);
>> +skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags, page,
>> +offset + RX_COPYHDR, len - RX_COPYHDR,
>> +RX_BUF_SIZE);
>> +}
>> +
>> +skb->protocol = eth_type_trans(skb, dev);
>> +netif_receive_skb(skb);
>> +}
>> +
>
> Any reason you do not use napi_gro_receive(>napi, skb) instead of
> netif_receive_skb() ?

Because I haven't been following the netdev list closely for the last
five years, and no documentation I read mentioned this function.  I can
certainly change it.

-- 
Måns Rullgård
m...@mansr.com
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH nf 2/2] netfilter: nf_tables: add clone interface to expression operations

2015-11-10 Thread Patrick McHardy
On 10.11, Pablo Neira Ayuso wrote:
> On Tue, Nov 10, 2015 at 06:30:34PM +, Patrick McHardy wrote:
> > >   __module_get(src->ops->type->owner);
> > > - memcpy(dst, src, src->ops->size);
> > > + if (src->ops->clone) {
> > > + memcpy(dst, src, sizeof(*src));
> > 
> > Why copy if we clone? The function should do a full initialization if it is
> > present I would say.
> 
> This is not copying the variable length data area of the expression,
> just the expression head.

Ah right. But that is only ->ops. We can set this directly, should generate
better code and be easier to understand.

> 
> > > + err = src->ops->clone(dst, src);
> > > + if (err < 0)
> > > + return err;
> > > + } else {
> > > + memcpy(dst, src, src->ops->size);
> > > + }
> > > + return 0;
> > >  }
> > >  
> 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH nf 2/2] netfilter: nf_tables: add clone interface to expression operations

2015-11-10 Thread Patrick McHardy
On 10.11, Pablo Neira Ayuso wrote:
> With the conversion of the counter expressions to make it percpu, we
> need to clone the percpu memory area, otherwise we crash when using
> counters from flow tables.
> 
> Signed-off-by: Pablo Neira Ayuso 
> ---
>  include/net/netfilter/nf_tables.h | 16 +++--
>  net/netfilter/nft_counter.c   | 49 
> ---
>  net/netfilter/nft_dynset.c|  5 ++--
>  3 files changed, 58 insertions(+), 12 deletions(-)
> 
> diff --git a/include/net/netfilter/nf_tables.h 
> b/include/net/netfilter/nf_tables.h
> index c9149cc..c186457 100644
> --- a/include/net/netfilter/nf_tables.h
> +++ b/include/net/netfilter/nf_tables.h
> @@ -630,6 +630,8 @@ struct nft_expr_ops {
>   int (*validate)(const struct nft_ctx *ctx,
>   const struct nft_expr *expr,
>   const struct nft_data 
> **data);
> + int (*clone)(struct nft_expr *dst,
> +  const struct nft_expr *src);

The functions and data needed during runtime are deliberately kept together
at the beginning of the structure to avoid having to read the entire thing.
So I'd say this shoud go after ->eval().

> @@ -660,10 +662,20 @@ void nft_expr_destroy(const struct nft_ctx *ctx, struct 
> nft_expr *expr);
>  int nft_expr_dump(struct sk_buff *skb, unsigned int attr,
> const struct nft_expr *expr);
>  
> -static inline void nft_expr_clone(struct nft_expr *dst, struct nft_expr *src)
> +static inline int nft_expr_clone(struct nft_expr *dst, struct nft_expr *src)
>  {
> + int err;
> +
>   __module_get(src->ops->type->owner);
> - memcpy(dst, src, src->ops->size);
> + if (src->ops->clone) {
> + memcpy(dst, src, sizeof(*src));

Why copy if we clone? The function should do a full initialization if it is
present I would say.

> + err = src->ops->clone(dst, src);
> + if (err < 0)
> + return err;
> + } else {
> + memcpy(dst, src, src->ops->size);
> + }
> + return 0;
>  }
>  
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5] net: ethernet: add driver for Aurora VLSI NB8800 Ethernet controller

2015-11-10 Thread Måns Rullgård
Mason  writes:

> On 10/11/2015 17:14, Mans Rullgard wrote:
>
>> This adds a driver for the Aurora VLSI NB8800 Ethernet controller.
>> It is an almost complete rewrite of a driver originally found in
>> a Sigma Designs 2.6.22 tree.
>> 
>> Signed-off-by: Mans Rullgard 
>> ---
>> Changes:
>> - Refactored mdio access functions
>> - Refactored register access helpers
>> - Improved error handling in rx buffer allocation
>> - Optimised some fifo parameters
>> - Overhauled tx dma. Multiple packets are now chained in a single dma
>>   operation if xmit_more is set, improving performance.
>> - Improved rx irq handling. It's not possible to disable interrupts
>>   entirely for napi poll, but they can be slowed down a little.
>> - Use readx_poll_timeout in various places
>> - Improved error detection
>> - Improved statistics
>> - Report hardware statistics counters through ethtool
>> - Improved tangox-specific setup
>> - Support for flow control using pause frames
>> - Explanatory comments added
>> - Various minor stylistic changes
>> ---
>>  drivers/net/ethernet/Kconfig |1 +
>>  drivers/net/ethernet/Makefile|1 +
>>  drivers/net/ethernet/aurora/Kconfig  |   20 +
>>  drivers/net/ethernet/aurora/Makefile |1 +
>>  drivers/net/ethernet/aurora/nb8800.c | 1530 
>> ++
>>  drivers/net/ethernet/aurora/nb8800.h |  314 +++
>>  6 files changed, 1867 insertions(+)
>
> The code has grown much since the previous patch, despite some
> refactoring. Is this mostly due to ethtool_ops support?
>
>  drivers/net/ethernet/aurora/nb8800.c | 1146 
> ++
>  drivers/net/ethernet/aurora/nb8800.h |  230 +++

Some of the increase is from new features, some from improvements, and
then there are a bunch of new comments.

-- 
Måns Rullgård
m...@mansr.com
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH nf 2/2] netfilter: nf_tables: add clone interface to expression operations

2015-11-10 Thread Patrick McHardy
On 10.11, Pablo Neira Ayuso wrote:
> On Tue, Nov 10, 2015 at 06:58:05PM +, Patrick McHardy wrote:
> > On 10.11, Pablo Neira Ayuso wrote:
> > > On Tue, Nov 10, 2015 at 06:30:34PM +, Patrick McHardy wrote:
> > > > >   __module_get(src->ops->type->owner);
> > > > > - memcpy(dst, src, src->ops->size);
> > > > > + if (src->ops->clone) {
> > > > > + memcpy(dst, src, sizeof(*src));
> > > > 
> > > > Why copy if we clone? The function should do a full initialization if 
> > > > it is
> > > > present I would say.
> > > 
> > > This is not copying the variable length data area of the expression,
> > > just the expression head.
> > 
> > Ah right. But that is only ->ops. We can set this directly, should generate
> > better code and be easier to understand.
> 
> I left the memcpy just to avoid that we forget in case we ever get
> more data there (unlikely). So I'll set the pointer instead.
> 
> If no further objections, will make those two changes locally and will
> push this upstream.

No further objections :)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] unix: avoid use-after-free in ep_remove_wait_queue

2015-11-10 Thread Rainer Weikusat
David Miller  writes:
> From: Rainer Weikusat 
> Date: Mon, 09 Nov 2015 14:40:48 +
>
>> +__remove_wait_queue(_sk(u->peer_wake.private)->peer_wait,
>> +>peer_wake);
>
> This is more simply:
>
>   __remove_wait_queue(_sk(u->peer_wake.private)->peer_wait, q);

Thank you for pointing this out.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -stable] virtio-net: drop NETIF_F_FRAGLIST

2015-11-10 Thread David Miller
From: "Charles (Chas) Williams" <3ch...@gmail.com>
Date: Tue, 10 Nov 2015 13:26:05 -0500

> Dave, could you please add
> 
> commit 48900cb6af4282fa0fb6ff4d72a81aa3dadb5c39
> virtio-net: drop NETIF_F_FRAGLIST
> 
> to your stable queues for 3.14.y and 4.1.y?

Ok, queued up.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8] can: xilinx: Convert to runtime_pm

2015-11-10 Thread Marc Kleine-Budde
On 10/26/2015 07:11 AM, Kedareswara rao Appana wrote:
> Instead of enabling/disabling clocks at several locations in the driver,
> Use the runtime_pm framework. This consolidates the actions for runtime PM
> In the appropriate callbacks and makes the driver more readable and 
> mantainable.
> 
> Signed-off-by: Kedareswara rao Appana 

Applied to can-next.

thanks,
Marc

-- 
Pengutronix e.K.  | Marc Kleine-Budde   |
Industrial Linux Solutions| Phone: +49-231-2826-924 |
Vertretung West/Dortmund  | Fax:   +49-5121-206917- |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5] net: ethernet: add driver for Aurora VLSI NB8800 Ethernet controller

2015-11-10 Thread Måns Rullgård
David Miller  writes:

> From: Måns Rullgård 
> Date: Tue, 10 Nov 2015 18:05:15 +
>
>> Because I haven't been following the netdev list closely for the last
>> five years, and no documentation I read mentioned this function.  I can
>> certainly change it.
>
> It is always advisable to mimick what other drivers do and use them as
> a reference, rather than depend upon documentation which by definition
> is always going to be out of sync with the source tree.

Sure.  The trick is to pick the right driver(s) to use as reference.
Quite a few of them don't use that function.

-- 
Måns Rullgård
m...@mansr.com
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] bpf_trace: Make dependent on PERF_EVENTS

2015-11-10 Thread Steven Rostedt
On Tue, 10 Nov 2015 15:40:35 -0500 (EST)
David Miller  wrote:


> I'll apply this, thanks Steven et al.

Thanks David.

-- Steve

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5] net: ethernet: add driver for Aurora VLSI NB8800 Ethernet controller

2015-11-10 Thread David Miller
From: Måns Rullgård 
Date: Tue, 10 Nov 2015 20:53:19 +

> David Miller  writes:
> 
>> From: Måns Rullgård 
>> Date: Tue, 10 Nov 2015 18:05:15 +
>>
>>> Because I haven't been following the netdev list closely for the last
>>> five years, and no documentation I read mentioned this function.  I can
>>> certainly change it.
>>
>> It is always advisable to mimick what other drivers do and use them as
>> a reference, rather than depend upon documentation which by definition
>> is always going to be out of sync with the source tree.
> 
> Sure.  The trick is to pick the right driver(s) to use as reference.
> Quite a few of them don't use that function.

If you really are stumped on this matter, start at least with the
ixgbe driver.  In fact pretty much every Intel ethernet driver is
a reasonable reference.  Others to check out are bnx2x and mlx5.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] arm64: bpf: fix JIT stack setup

2015-11-10 Thread Shi, Yang

On 11/9/2015 12:00 PM, Z Lim wrote:

On Mon, Nov 9, 2015 at 10:08 AM, Shi, Yang  wrote:

I added it to stay align with ARMv8 AAPCS to maintain the correct FP during
function call. It makes us get correct stack backtrace.

I think we'd better to keep compliant with ARMv8 AAPCS in BPF JIT prologue
too.

If nobody thinks it is necessary, we definitely could remove that change.


Oh no, I don't think anyone will say it's unnecessary!
I agree the A64_FP-related change is a good idea, so stack unwinding works.

How about splitting this into two patches? One for the BPF-related
bug, and another for A64 FP-handling.


I'm not sure if this is a good approach or not. IMHO, they are kind of 
atomic. Without A64 FP-handling, that fix looks incomplete and 
introduces another problem (stack backtrace).


Thanks,
Yang



Thanks again for tracking this down and improving things overall for arm64 :)



Thanks,
Yang




--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5] net: ethernet: add driver for Aurora VLSI NB8800 Ethernet controller

2015-11-10 Thread David Miller
From: Måns Rullgård 
Date: Tue, 10 Nov 2015 18:05:15 +

> Because I haven't been following the netdev list closely for the last
> five years, and no documentation I read mentioned this function.  I can
> certainly change it.

It is always advisable to mimick what other drivers do and use them as
a reference, rather than depend upon documentation which by definition
is always going to be out of sync with the source tree.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] bpf_trace: Make dependent on PERF_EVENTS

2015-11-10 Thread Steven Rostedt

Arnd Bergmann reported:

  In my ARM randconfig tests, I'm getting a build error for
  newly added code in bpf_perf_event_read and bpf_perf_event_output
  whenever CONFIG_PERF_EVENTS is disabled:

  kernel/trace/bpf_trace.c: In function 'bpf_perf_event_read':
  kernel/trace/bpf_trace.c:203:11: error: 'struct perf_event' has no member 
named 'oncpu'
  if (event->oncpu != smp_processor_id() ||
   ^
  kernel/trace/bpf_trace.c:204:11: error: 'struct perf_event' has no member 
named 'pmu'
event->pmu->count)

  This can happen when UPROBE_EVENT is enabled but KPROBE_EVENT
  is disabled. I'm not sure if that is a configuration we care
  about, otherwise we could prevent this case from occuring by
  adding Kconfig dependencies.

Looking at this further, it's really that UPROBE_EVENT enables PERF_EVENTS.
By just having BPF_EVENTS depend on PERF_EVENTS, then all is fine.

Link: http://lkml.kernel.org/r/4525348.Aq9YoXkChv@wuerfel
Reported-by: Arnd Bergmann 
Signed-off-by: Steven Rostedt 
---
 kernel/trace/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index 8d6363f42169..e45db6b0d878 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -434,7 +434,7 @@ config UPROBE_EVENT
 
 config BPF_EVENTS
depends on BPF_SYSCALL
-   depends on KPROBE_EVENT || UPROBE_EVENT
+   depends on (KPROBE_EVENT || UPROBE_EVENT) && PERF_EVENTS
bool
default y
help
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Revert "bridge: Allow forward delay to be cfgd when STP enabled"

2015-11-10 Thread David Miller
From: Vladislav Yasevich 
Date: Tue, 10 Nov 2015 06:15:32 -0500

> This reverts commit 34c2d9fb0498c066afbe610b15e18995fd8be792.
> 
> There are 2 reasons for this revert:
>  1)  The commit in question doesn't do what it says it does.  The
>  description reads: "Allow bridge forward delay to be configured
>  when Spanning Tree is enabled."  This was already the case before
>  the commit was made.  What the commit actually do was disallow
>  invalid values or 'forward_delay' when STP was turned off.
> 
>  2)  The above change was actually a change in the user observed
>  behavior and broke things like libvirt and other network configs
>  that set 'forward_delay' to 0 without enabling STP.  The value
>  of 0 is actually used when STP is turned off to immediately mark
>  the bridge as forwarding.
> 
> Signed-off-by: Vlad Yasevich 

Fair enough, applied, thanks Vlad.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] bpf_trace: Make dependent on PERF_EVENTS

2015-11-10 Thread Arnd Bergmann
On Tuesday 10 November 2015 15:28:17 Steven Rostedt wrote:
> Arnd Bergmann reported:
> 
>   In my ARM randconfig tests, I'm getting a build error for
>   newly added code in bpf_perf_event_read and bpf_perf_event_output
>   whenever CONFIG_PERF_EVENTS is disabled:
> 
>   kernel/trace/bpf_trace.c: In function 'bpf_perf_event_read':
>   kernel/trace/bpf_trace.c:203:11: error: 'struct perf_event' has no member 
> named 'oncpu'
>   if (event->oncpu != smp_processor_id() ||
>^
>   kernel/trace/bpf_trace.c:204:11: error: 'struct perf_event' has no member 
> named 'pmu'
> event->pmu->count)
> 
>   This can happen when UPROBE_EVENT is enabled but KPROBE_EVENT
>   is disabled. I'm not sure if that is a configuration we care
>   about, otherwise we could prevent this case from occuring by
>   adding Kconfig dependencies.
> 
> Looking at this further, it's really that UPROBE_EVENT enables PERF_EVENTS.
> By just having BPF_EVENTS depend on PERF_EVENTS, then all is fine.
> 
> Link: http://lkml.kernel.org/r/4525348.Aq9YoXkChv@wuerfel
> Reported-by: Arnd Bergmann 
> Signed-off-by: Steven Rostedt 
> 

Ok, sounds good.

Acked-by: Arnd Bergmann 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH nf 1/2] net: add __netdev_alloc_pcpu_stats() to indicate gfp flags

2015-11-10 Thread David Miller
From: Pablo Neira Ayuso 
Date: Tue, 10 Nov 2015 17:36:28 +0100

> nf_tables may create percpu counters from the packet path through its
> dynamic set instantiation infrastructure, so we need a way to allocate
> this through GFP_ATOMIC.
> 
> Signed-off-by: Pablo Neira Ayuso 

Acked-by: David S. Miller 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] bpf_trace: Make dependent on PERF_EVENTS

2015-11-10 Thread David Miller
From: Steven Rostedt 
Date: Tue, 10 Nov 2015 15:28:17 -0500

> 
> Arnd Bergmann reported:
> 
>   In my ARM randconfig tests, I'm getting a build error for
>   newly added code in bpf_perf_event_read and bpf_perf_event_output
>   whenever CONFIG_PERF_EVENTS is disabled:
> 
>   kernel/trace/bpf_trace.c: In function 'bpf_perf_event_read':
>   kernel/trace/bpf_trace.c:203:11: error: 'struct perf_event' has no member 
> named 'oncpu'
>   if (event->oncpu != smp_processor_id() ||
>^
>   kernel/trace/bpf_trace.c:204:11: error: 'struct perf_event' has no member 
> named 'pmu'
> event->pmu->count)
> 
>   This can happen when UPROBE_EVENT is enabled but KPROBE_EVENT
>   is disabled. I'm not sure if that is a configuration we care
>   about, otherwise we could prevent this case from occuring by
>   adding Kconfig dependencies.
> 
> Looking at this further, it's really that UPROBE_EVENT enables PERF_EVENTS.
> By just having BPF_EVENTS depend on PERF_EVENTS, then all is fine.
> 
> Link: http://lkml.kernel.org/r/4525348.Aq9YoXkChv@wuerfel
> Reported-by: Arnd Bergmann 
> Signed-off-by: Steven Rostedt 

I'll apply this, thanks Steven et al.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH net-next v2 7/8] openvswitch: Delay conntrack helper call for new connections.

2015-11-10 Thread Jarno Rajahalme

> On Nov 9, 2015, at 5:26 AM, Patrick McHardy  wrote:
> 
> On 06.11, Jarno Rajahalme wrote:
>> There is no need to help connections that are not confirmed, so we can
>> delay helping new connections to the time when they are confirmed.
>> This change is needed for NAT support, and having this as a separate
>> patch will make the following NAT patch a bit easier to review.
> 
> For the first packet a helper receives the connection is always unconfirmed.
> It makes no sense to confirm it if the helper drops the packet.
> 

Right, the nf_conntrack_confirm() call is still done only if helper accepts the 
packet.

The issue this patch deals with is that in a fairly typical pattern the packet 
will be passed through conntrack by a CT action and then recirculated (for 
matching on the connection state, using the RECIRC action), and later an 
another CT action is used to confirm the connection, possibly with NAT. Before 
this patch, __ovs_ct_lookup() would have passed the packet through the helper 
in the first step, while the NAT call would only happen with the second step, 
i.e., in the wrong order. With this patch the helper for new connections is 
called with the second step, after calling the NAT code (as added by the patch 
8/8). For non-new packets we must call the helper with the first conntrack 
lookup, as there typically are no later CT actions for packet belonging to 
existing connections. For new connections we know that if the first CT action 
does not have the ‘commit’ flag (which causes nf_conntrack_confirm() to be 
called), we can safely postpone the helper call, as there has to be a later CT 
action for the connection to be confirmed.

  Jarno

>> Signed-off-by: Jarno Rajahalme 
>> ---
>> net/openvswitch/conntrack.c | 20 +++-
>> 1 file changed, 15 insertions(+), 5 deletions(-)
>> 
>> diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
>> index 7aa38fa..ba44287 100644
>> --- a/net/openvswitch/conntrack.c
>> +++ b/net/openvswitch/conntrack.c
>> @@ -458,6 +458,7 @@ static bool skb_nfct_cached(struct net *net,
>> /* Pass 'skb' through conntrack in 'net', using zone configured in 'info', if
>>  * not done already.  Update key with new CT state after passing the packet
>>  * through conntrack.
>> + * Note that invalid packets are accepted while the skb->nfct remains unset!
>>  */
>> static int __ovs_ct_lookup(struct net *net, struct sw_flow_key *key,
>> const struct ovs_conntrack_info *info,
>> @@ -468,7 +469,11 @@ static int __ovs_ct_lookup(struct net *net, struct 
>> sw_flow_key *key,
>>   * actually run the packet through conntrack twice unless it's for a
>>   * different zone.
>>   */
>> -if (!skb_nfct_cached(net, key, info, skb)) {
>> +bool cached = skb_nfct_cached(net, key, info, skb);
>> +enum ip_conntrack_info ctinfo;
>> +struct nf_conn *ct;
>> +
>> +if (!cached) {
>>  struct nf_conn *tmpl = info->ct;
>>  int err;
>> 
>> @@ -491,11 +496,16 @@ static int __ovs_ct_lookup(struct net *net, struct 
>> sw_flow_key *key,
>>  return -ENOENT;
>> 
>>  ovs_ct_update_key(skb, key, true);
>> +}
>> 
>> -if (ovs_ct_helper(skb, info->family) != NF_ACCEPT) {
>> -WARN_ONCE(1, "helper rejected packet");
>> -return -EINVAL;
>> -}
>> +/* Call the helper right after nf_conntrack_in() for confirmed
>> + * connections, but only when commiting for unconfirmed connections.
>> + */
>> +ct = nf_ct_get(skb, );
>> +if (ct && (nf_ct_is_confirmed(ct) ? !cached : info->commit)
>> +&& ovs_ct_helper(skb, info->family) != NF_ACCEPT) {
>> +WARN_ONCE(1, "helper rejected packet");
>> +return -EINVAL;
>>  }
>> 
>>  return 0;
>> -- 
>> 2.1.4
>> 
>> --
>> To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> 

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5] net: ethernet: add driver for Aurora VLSI NB8800 Ethernet controller

2015-11-10 Thread Måns Rullgård
David Miller  writes:

> From: Måns Rullgård 
> Date: Tue, 10 Nov 2015 20:53:19 +
>
>> David Miller  writes:
>> 
>>> From: Måns Rullgård 
>>> Date: Tue, 10 Nov 2015 18:05:15 +
>>>
 Because I haven't been following the netdev list closely for the last
 five years, and no documentation I read mentioned this function.  I can
 certainly change it.
>>>
>>> It is always advisable to mimick what other drivers do and use them as
>>> a reference, rather than depend upon documentation which by definition
>>> is always going to be out of sync with the source tree.
>> 
>> Sure.  The trick is to pick the right driver(s) to use as reference.
>> Quite a few of them don't use that function.
>
> If you really are stumped on this matter, start at least with the
> ixgbe driver.  In fact pretty much every Intel ethernet driver is
> a reasonable reference.  Others to check out are bnx2x and mlx5.

Even ixgbe uses napi_complete() while netdevice.h says one should
"consider using napi_complete_done() instead."  Did the author consider
it and decide not to, or has the driver simply not been updated?

As for the napi_gro_receive() function, calling that instead of
netif_receive_skb() is easy enough, or are there other things I should
be doing in addition?

-- 
Måns Rullgård
m...@mansr.com
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5] net: ethernet: add driver for Aurora VLSI NB8800 Ethernet controller

2015-11-10 Thread Eric Dumazet
On Tue, 2015-11-10 at 21:21 +, Måns Rullgård wrote:

> Even ixgbe uses napi_complete() while netdevice.h says one should
> "consider using napi_complete_done() instead."  Did the author consider
> it and decide not to, or has the driver simply not been updated?

napi_complete_done() is quite new, very few drivers use it.

It still requires a tuning (/sys/class/net/ethX/gro_flush_timeout)

> 
> As for the napi_gro_receive() function, calling that instead of
> netif_receive_skb() is easy enough, or are there other things I should
> be doing in addition?

Nothing comes to mind, if you already have a NAPI context,
napi_gro_receive() is the way to go...



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net v2 3/3] packet: fix tpacket_snd max frame and vlan handling

2015-11-10 Thread Daniel Borkmann

On 11/10/2015 11:52 PM, Willem de Bruijn wrote:

 if (sock->type == SOCK_DGRAM) {
-   err = dev_hard_header(skb, dev, ntohs(proto), addr,
-   NULL, tp_len);
+   /* In DGRAM sockets, we expect struct sockaddr_ll was filled
+* via struct msghdr, so we have dest mac and skb->protocol.
+* Otherwise there's not too much useful things we can do in
+* this flush run.
+*/
+   err = dev_hard_header(skb, dev, ntohs(skb->protocol), addr,
+ NULL, tp_len);


This change is not really necessary.


Sure agreed, I found it helpful though. Don't mind removing it.


 if (unlikely(err < 0))
 return -EINVAL;
-   } else if (dev->hard_header_len) {


Why remove the check on hard_header_len?


Hmm, the patch doesn't remove the check (it's moved further below).


-   if (ll_header_truncated(dev, tp_len))
-   return -EINVAL;
+   } else {
+   /* If skb->protocol is still 0, try to infer/guess it. Might
+* not be fully reliable in the sense that a user could still
+* change/race data afterwards, but on the other hand the proto


The race goes away when probing it after the copy in skb_store_bits.
Then it is also certain that tp_len is long enough to hold the entire
link layer header.


The skb_store_bits() is only done in case we do have a dev->hard_header_len
or in case where we run into a possible situation where we have the additional
4 bytes on a full frame. In that case we need to check them properly, which
requires copying, otherwise we don't copy any header.


+* can be set arbitrarily anyways. We only need to take care
+* in case of extra large VLAN frames.
+*/
+   if (!skb->protocol && tp_len >= ETH_HLEN)
+   skb->protocol = ((struct ethhdr *)data)->h_proto;


Packet sockets are not restricted to link layer of type Ethernet.

There are a few other points in this file that also cast mac header
to eth_hdr(skb).


Ok, the set doesn't address this assumption which we have elsewhere, too.
Do you suggest to also check on dev->type for these cases?

Thanks,
Daniel
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net v2 3/3] packet: fix tpacket_snd max frame and vlan handling

2015-11-10 Thread Willem de Bruijn
> if (sock->type == SOCK_DGRAM) {
> -   err = dev_hard_header(skb, dev, ntohs(proto), addr,
> -   NULL, tp_len);
> +   /* In DGRAM sockets, we expect struct sockaddr_ll was filled
> +* via struct msghdr, so we have dest mac and skb->protocol.
> +* Otherwise there's not too much useful things we can do in
> +* this flush run.
> +*/
> +   err = dev_hard_header(skb, dev, ntohs(skb->protocol), addr,
> + NULL, tp_len);

This change is not really necessary.

> if (unlikely(err < 0))
> return -EINVAL;
> -   } else if (dev->hard_header_len) {

Why remove the check on hard_header_len?

> -   if (ll_header_truncated(dev, tp_len))
> -   return -EINVAL;
> +   } else {
> +   /* If skb->protocol is still 0, try to infer/guess it. Might
> +* not be fully reliable in the sense that a user could still
> +* change/race data afterwards, but on the other hand the 
> proto

The race goes away when probing it after the copy in skb_store_bits.
Then it is also certain that tp_len is long enough to hold the entire
link layer header.

> +* can be set arbitrarily anyways. We only need to take care
> +* in case of extra large VLAN frames.
> +*/
> +   if (!skb->protocol && tp_len >= ETH_HLEN)
> +   skb->protocol = ((struct ethhdr *)data)->h_proto;

Packet sockets are not restricted to link layer of type Ethernet.

There are a few other points in this file that also cast mac header
to eth_hdr(skb).

>
> -   skb_push(skb, dev->hard_header_len);
> -   err = skb_store_bits(skb, 0, data,
> -   dev->hard_header_len);
> -   if (unlikely(err))
> -   return err;
> +   /* Copy Ethernet header in case we need to deal with extra
> +* VLAN space as otherwise data could change underneath us.
> +* The caller already accomodated for enough linear room.
> +*/
> +   if (dev->hard_header_len || tp_len > dev->mtu + reserve) {
> +   int hdr_len = dev->hard_header_len;
> +
> +   if (hdr_len < ETH_HLEN)
> +   hdr_len = ETH_HLEN;
> +   if (unlikely(ll_header_truncated(hdr_len, tp_len)))
> +   return -EINVAL;
> +
> +   skb_push(skb, hdr_len);
> +   err = skb_store_bits(skb, 0, data, hdr_len);
> +   if (unlikely(err))
> +   return err;
>
> -   data += dev->hard_header_len;
> -   to_write -= dev->hard_header_len;
> +   data += hdr_len;
> +   to_write -= hdr_len;
> +   }
> }
>
> offset = offset_in_page(data);
> @@ -2447,6 +2471,20 @@ static int tpacket_fill_skb(struct packet_sock *po, 
> struct sk_buff *skb,
> len = ((to_write > len_max) ? len_max : to_write);
> }
>
> +   /* Check for full frame with extra VLAN space. We can probe
> +* here on the linear header, if necessary. Earlier code
> +* assumed this would be a VLAN pkt, double-check this now
> +* that we have the actual packet and protocol at hand.
> +*/
> +   if (tp_len > dev->mtu + reserve) {
> +   if (skb->protocol != htons(ETH_P_8021Q))
> +   return -EMSGSIZE;
> +
> +   skb_reset_mac_header(skb);
> +   if (eth_hdr(skb)->h_proto != htons(ETH_P_8021Q))
> +   return -EMSGSIZE;
> +   }
> +
> skb_probe_transport_header(skb, 0);
>
> return tp_len;
> @@ -2493,12 +2531,12 @@ static int tpacket_snd(struct packet_sock *po, struct 
> msghdr *msg)
> if (unlikely(!(dev->flags & IFF_UP)))
> goto out_put;
>
> -   reserve = dev->hard_header_len + VLAN_HLEN;
> -   size_max = po->tx_ring.frame_size
> -   - (po->tp_hdrlen - sizeof(struct sockaddr_ll));
> -
> -   if (size_max > dev->mtu + reserve)
> -   size_max = dev->mtu + reserve;
> +   if (po->sk.sk_socket->type == SOCK_RAW)
> +   reserve = dev->hard_header_len;
> +   size_max = po->tx_ring.frame_size - (po->tp_hdrlen -
> +sizeof(struct sockaddr_ll));
> +   if (size_max > dev->mtu + reserve + VLAN_HLEN)
> +   size_max = dev->mtu + reserve + VLAN_HLEN;
>
> do {
> ph = packet_current_frame(po, >tx_ring,
> @@ -2523,20 +2561,7 @@ static int tpacket_snd(struct packet_sock *po, struct 
> msghdr *msg)
> goto 

Re: [PATCH] unix: avoid use-after-free in ep_remove_wait_queue

2015-11-10 Thread Rainer Weikusat
An AF_UNIX datagram socket being the client in an n:1 association with
some server socket is only allowed to send messages to the server if the
receive queue of this socket contains at most sk_max_ack_backlog
datagrams. This implies that prospective writers might be forced to go
to sleep despite none of the message presently enqueued on the server
receive queue were sent by them. In order to ensure that these will be
woken up once space becomes again available, the present unix_dgram_poll
routine does a second sock_poll_wait call with the peer_wait wait queue
of the server socket as queue argument (unix_dgram_recvmsg does a wake
up on this queue after a datagram was received). This is inherently
problematic because the server socket is only guaranteed to remain alive
for as long as the client still holds a reference to it. In case the
connection is dissolved via connect or by the dead peer detection logic
in unix_dgram_sendmsg, the server socket may be freed despite "the
polling mechanism" (in particular, epoll) still has a pointer to the
corresponding peer_wait queue. There's no way to forcibly deregister a
wait queue with epoll.

Based on an idea by Jason Baron, the patch below changes the code such
that a wait_queue_t belonging to the client socket is enqueued on the
peer_wait queue of the server whenever the peer receive queue full
condition is detected by either a sendmsg or a poll. A wake up on the
peer queue is then relayed to the ordinary wait queue of the client
socket via wake function. The connection to the peer wait queue is again
dissolved if either a wake up is about to be relayed or the client
socket reconnects or a dead peer is detected or the client socket is
itself closed. This enables removing the second sock_poll_wait from
unix_dgram_poll, thus avoiding the use-after-free, while still ensuring
that no blocked writer sleeps forever.

Signed-off-by: Rainer Weikusat 
---

- use wait_queue_t passed as argument to _relay

- fix possible deadlock and logic error in _dgram_sendmsg by
  straightening the control flow ("spaghetti code considered
  confusing")

diff --git a/include/net/af_unix.h b/include/net/af_unix.h
index b36d837..2a91a05 100644
--- a/include/net/af_unix.h
+++ b/include/net/af_unix.h
@@ -62,6 +62,7 @@ struct unix_sock {
 #define UNIX_GC_CANDIDATE  0
 #define UNIX_GC_MAYBE_CYCLE1
struct socket_wqpeer_wq;
+   wait_queue_tpeer_wake;
 };
 
 static inline struct unix_sock *unix_sk(const struct sock *sk)
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 94f6582..4297d8e 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -326,6 +326,122 @@ found:
return s;
 }
 
+/* Support code for asymmetrically connected dgram sockets
+ *
+ * If a datagram socket is connected to a socket not itself connected
+ * to the first socket (eg, /dev/log), clients may only enqueue more
+ * messages if the present receive queue of the server socket is not
+ * "too large". This means there's a second writeability condition
+ * poll and sendmsg need to test. The dgram recv code will do a wake
+ * up on the peer_wait wait queue of a socket upon reception of a
+ * datagram which needs to be propagated to sleeping would-be writers
+ * since these might not have sent anything so far. This can't be
+ * accomplished via poll_wait because the lifetime of the server
+ * socket might be less than that of its clients if these break their
+ * association with it or if the server socket is closed while clients
+ * are still connected to it and there's no way to inform "a polling
+ * implementation" that it should let go of a certain wait queue
+ *
+ * In order to propagate a wake up, a wait_queue_t of the client
+ * socket is enqueued on the peer_wait queue of the server socket
+ * whose wake function does a wake_up on the ordinary client socket
+ * wait queue. This connection is established whenever a write (or
+ * poll for write) hit the flow control condition and broken when the
+ * association to the server socket is dissolved or after a wake up
+ * was relayed.
+ */
+
+static int unix_dgram_peer_wake_relay(wait_queue_t *q, unsigned mode, int 
flags,
+ void *key)
+{
+   struct unix_sock *u;
+   wait_queue_head_t *u_sleep;
+
+   u = container_of(q, struct unix_sock, peer_wake);
+
+   __remove_wait_queue(_sk(u->peer_wake.private)->peer_wait,
+   q);
+   u->peer_wake.private = NULL;
+
+   /* relaying can only happen while the wq still exists */
+   u_sleep = sk_sleep(>sk);
+   if (u_sleep)
+   wake_up_interruptible_poll(u_sleep, key);
+
+   return 0;
+}
+
+static int unix_dgram_peer_wake_connect(struct sock *sk, struct sock *other)
+{
+   struct unix_sock *u, *u_other;
+   int rc;
+
+   u = unix_sk(sk);
+   u_other = unix_sk(other);
+   rc = 0;
+
+   

[PATCH 1/2] arm64: bpf: add 'store immediate' instruction

2015-11-10 Thread Yang Shi
aarch64 doesn't have native store immediate instruction, such operation
has to be implemented by the below instruction sequence:

Load immediate to register
Store register

Signed-off-by: Yang Shi 
CC: Zi Shen Lim 
CC: Xi Wang 
---
 arch/arm64/net/bpf_jit_comp.c | 20 +++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
index 6809647..49c1f1b 100644
--- a/arch/arm64/net/bpf_jit_comp.c
+++ b/arch/arm64/net/bpf_jit_comp.c
@@ -563,7 +563,25 @@ emit_cond_jmp:
case BPF_ST | BPF_MEM | BPF_H:
case BPF_ST | BPF_MEM | BPF_B:
case BPF_ST | BPF_MEM | BPF_DW:
-   goto notyet;
+   /* Load imm to a register then store it */
+   ctx->tmp_used = 1;
+   emit_a64_mov_i(1, tmp2, off, ctx);
+   emit_a64_mov_i(1, tmp, imm, ctx);
+   switch (BPF_SIZE(code)) {
+   case BPF_W:
+   emit(A64_STR32(tmp, dst, tmp2), ctx);
+   break;
+   case BPF_H:
+   emit(A64_STRH(tmp, dst, tmp2), ctx);
+   break;
+   case BPF_B:
+   emit(A64_STRB(tmp, dst, tmp2), ctx);
+   break;
+   case BPF_DW:
+   emit(A64_STR64(tmp, dst, tmp2), ctx);
+   break;
+   }
+   break;
 
/* STX: *(size *)(dst + off) = src */
case BPF_STX | BPF_MEM | BPF_W:
-- 
2.0.2

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] arm64: bpf: add BPF XADD instruction

2015-11-10 Thread Yang Shi
aarch64 doesn't have native support for XADD instruction, implement it by
the below instruction sequence:

Load (dst + off) to a register
Add src to it
Store it back to (dst + off)

Signed-off-by: Yang Shi 
CC: Zi Shen Lim 
CC: Xi Wang 
---
 arch/arm64/net/bpf_jit_comp.c | 19 +++
 1 file changed, 15 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
index 49c1f1b..0b1d2d3 100644
--- a/arch/arm64/net/bpf_jit_comp.c
+++ b/arch/arm64/net/bpf_jit_comp.c
@@ -609,7 +609,21 @@ emit_cond_jmp:
case BPF_STX | BPF_XADD | BPF_W:
/* STX XADD: lock *(u64 *)(dst + off) += src */
case BPF_STX | BPF_XADD | BPF_DW:
-   goto notyet;
+   ctx->tmp_used = 1;
+   emit_a64_mov_i(1, tmp2, off, ctx);
+   switch (BPF_SIZE(code)) {
+   case BPF_W:
+   emit(A64_LDR32(tmp, dst, tmp2), ctx);
+   emit(A64_ADD(is64, tmp, tmp, src), ctx);
+   emit(A64_STR32(tmp, dst, tmp2), ctx);
+   break;
+   case BPF_DW:
+   emit(A64_LDR64(tmp, dst, tmp2), ctx);
+   emit(A64_ADD(is64, tmp, tmp, src), ctx);
+   emit(A64_STR64(tmp, dst, tmp2), ctx);
+   break;
+   }
+   break;
 
/* R0 = ntohx(*(size *)(((struct sk_buff *)R6)->data + imm)) */
case BPF_LD | BPF_ABS | BPF_W:
@@ -679,9 +693,6 @@ emit_cond_jmp:
}
break;
}
-notyet:
-   pr_info_once("*** NOT YET: opcode %02x ***\n", code);
-   return -EFAULT;
 
default:
pr_err_once("unknown opcode %02x\n", code);
-- 
2.0.2

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5] net: ethernet: add driver for Aurora VLSI NB8800 Ethernet controller

2015-11-10 Thread Måns Rullgård
Andy Shevchenko  writes:

> On Wed, Nov 11, 2015 at 12:34 AM, Måns Rullgård  wrote:
>> Andy Shevchenko  writes:
>>
 +static inline void nb8800_maskb(struct nb8800_priv *priv, int reg,
 +   u32 mask, u32 val)
 +{
 +   u32 old = nb8800_readb(priv, reg);
 +   u32 new = (old & ~mask) | val;
>>>
>>> Shoudn't be "… | (val & mask);" ?
>>
>> No, it's meant to replace the bits in mask with the corresponding bits
>> from val.
>
> But you unconditionally use entire val value which might bring bits
> outside of mask.

Very well, I'll apply the mask to both then.

 +   nb8800_writel(priv, NB8800_TX_DESC_ADDR, txb->dma_desc);
 +   wmb();  /* ensure desc addr is written before starting DMA 
 */
>>>
>>> Hm… Have I missed corresponding rmb() ? If it's about MMIO, perhaps 
>>> mmiowb() ?
>>
>> Possibly.
>
> Standalone wmb() doesn't make sense.

It does if you need to enforce ordering between normal and I/O memory.
In fact, since the descriptor is filled in using normal memory accesses,
my understanding is that mmiowb() would be insufficient here.  The
comment could be improved, however.

-- 
Måns Rullgård
m...@mansr.com
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] arm64: bpf: add BPF XADD instruction

2015-11-10 Thread Eric Dumazet
On Tue, 2015-11-10 at 14:41 -0800, Yang Shi wrote:
> aarch64 doesn't have native support for XADD instruction, implement it by
> the below instruction sequence:
> 
> Load (dst + off) to a register
> Add src to it
> Store it back to (dst + off)

Not really what is needed ?

See this BPF_XADD as an atomic_add() equivalent.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5] net: ethernet: add driver for Aurora VLSI NB8800 Ethernet controller

2015-11-10 Thread Andy Shevchenko
On Tue, Nov 10, 2015 at 6:14 PM, Mans Rullgard  wrote:
> This adds a driver for the Aurora VLSI NB8800 Ethernet controller.
> It is an almost complete rewrite of a driver originally found in
> a Sigma Designs 2.6.22 tree.

Few nitpicks below.

>
> Signed-off-by: Mans Rullgard 
> ---
> Changes:
> - Refactored mdio access functions
> - Refactored register access helpers
> - Improved error handling in rx buffer allocation
> - Optimised some fifo parameters
> - Overhauled tx dma. Multiple packets are now chained in a single dma
>   operation if xmit_more is set, improving performance.
> - Improved rx irq handling. It's not possible to disable interrupts
>   entirely for napi poll, but they can be slowed down a little.
> - Use readx_poll_timeout in various places
> - Improved error detection
> - Improved statistics
> - Report hardware statistics counters through ethtool
> - Improved tangox-specific setup
> - Support for flow control using pause frames
> - Explanatory comments added
> - Various minor stylistic changes
> ---
>  drivers/net/ethernet/Kconfig |1 +
>  drivers/net/ethernet/Makefile|1 +
>  drivers/net/ethernet/aurora/Kconfig  |   20 +
>  drivers/net/ethernet/aurora/Makefile |1 +
>  drivers/net/ethernet/aurora/nb8800.c | 1530 
> ++
>  drivers/net/ethernet/aurora/nb8800.h |  314 +++
>  6 files changed, 1867 insertions(+)
>  create mode 100644 drivers/net/ethernet/aurora/Kconfig
>  create mode 100644 drivers/net/ethernet/aurora/Makefile
>  create mode 100644 drivers/net/ethernet/aurora/nb8800.c
>  create mode 100644 drivers/net/ethernet/aurora/nb8800.h
>
> diff --git a/drivers/net/ethernet/Kconfig b/drivers/net/ethernet/Kconfig
> index 05aa759..8310163 100644
> --- a/drivers/net/ethernet/Kconfig
> +++ b/drivers/net/ethernet/Kconfig
> @@ -29,6 +29,7 @@ source "drivers/net/ethernet/apm/Kconfig"
>  source "drivers/net/ethernet/apple/Kconfig"
>  source "drivers/net/ethernet/arc/Kconfig"
>  source "drivers/net/ethernet/atheros/Kconfig"
> +source "drivers/net/ethernet/aurora/Kconfig"
>  source "drivers/net/ethernet/cadence/Kconfig"
>  source "drivers/net/ethernet/adi/Kconfig"
>  source "drivers/net/ethernet/broadcom/Kconfig"
> diff --git a/drivers/net/ethernet/Makefile b/drivers/net/ethernet/Makefile
> index ddfc808..b435fb0 100644
> --- a/drivers/net/ethernet/Makefile
> +++ b/drivers/net/ethernet/Makefile
> @@ -15,6 +15,7 @@ obj-$(CONFIG_NET_XGENE) += apm/
>  obj-$(CONFIG_NET_VENDOR_APPLE) += apple/
>  obj-$(CONFIG_NET_VENDOR_ARC) += arc/
>  obj-$(CONFIG_NET_VENDOR_ATHEROS) += atheros/
> +obj-$(CONFIG_NET_VENDOR_AURORA) += aurora/
>  obj-$(CONFIG_NET_CADENCE) += cadence/
>  obj-$(CONFIG_NET_BFIN) += adi/
>  obj-$(CONFIG_NET_VENDOR_BROADCOM) += broadcom/
> diff --git a/drivers/net/ethernet/aurora/Kconfig 
> b/drivers/net/ethernet/aurora/Kconfig
> new file mode 100644
> index 000..a3c7106
> --- /dev/null
> +++ b/drivers/net/ethernet/aurora/Kconfig
> @@ -0,0 +1,20 @@
> +config NET_VENDOR_AURORA
> +   bool "Aurora VLSI devices"
> +   help
> + If you have a network (Ethernet) device belonging to this class,
> + say Y.
> +
> + Note that the answer to this question doesn't directly affect the
> + kernel: saying N will just cause the configurator to skip all
> + questions about Aurora devices. If you say Y, you will be asked
> + for your specific device in the following questions.
> +
> +if NET_VENDOR_AURORA
> +
> +config AURORA_NB8800
> +   tristate "Aurora AU-NB8800 support"
> +   select PHYLIB
> +   help
> +Support for the AU-NB8800 gigabit Ethernet controller.
> +
> +endif
> diff --git a/drivers/net/ethernet/aurora/Makefile 
> b/drivers/net/ethernet/aurora/Makefile
> new file mode 100644
> index 000..6cb528a
> --- /dev/null
> +++ b/drivers/net/ethernet/aurora/Makefile
> @@ -0,0 +1 @@
> +obj-$(CONFIG_AURORA_NB8800) += nb8800.o
> diff --git a/drivers/net/ethernet/aurora/nb8800.c 
> b/drivers/net/ethernet/aurora/nb8800.c
> new file mode 100644
> index 000..11cd389
> --- /dev/null
> +++ b/drivers/net/ethernet/aurora/nb8800.c
> @@ -0,0 +1,1530 @@
> +/*
> + * Copyright (C) 2015 Mans Rullgard 
> + *
> + * Mostly rewritten, based on driver from Sigma Designs.  Original
> + * copyright notice below.
> + *
> + *
> + * Driver for tangox SMP864x/SMP865x/SMP867x/SMP868x builtin Ethernet Mac.
> + *
> + * Copyright (C) 2005 Maxime Bizon 
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> 

Re: [PATCH v5] net: ethernet: add driver for Aurora VLSI NB8800 Ethernet controller

2015-11-10 Thread Måns Rullgård
Andy Shevchenko  writes:

>> +static inline void nb8800_maskb(struct nb8800_priv *priv, int reg,
>> +   u32 mask, u32 val)
>> +{
>> +   u32 old = nb8800_readb(priv, reg);
>> +   u32 new = (old & ~mask) | val;
>
> Shoudn't be "… | (val & mask);" ?

No, it's meant to replace the bits in mask with the corresponding bits
from val.

> + empty line?
>
>> +   if (new != old)
>> +   nb8800_writeb(priv, reg, new);
>> +}
>> +

[...]

>> +static void nb8800_receive(struct net_device *dev, int i, int len)
>
> unsigned int i ?
> len as well?

Does it matter?  The values are nowhere near overflowing signed int.


[...]

>> +   /* If a packet arrived after we last checked but
>> +* before writing RX_ITR, the interrupt will be
>> +* delayed, so we retrieve it now. */
>
> Block comments usually
> /*
>  * text
>  */

Documentation/CodingStyle says net/ and drivers/net/ are special, though
currently a mix of styles can be found.  Personally, I don't
particularly care.

> Can be longer lines?

Still won't fit on two lines.

>> +   if (priv->rx_descs[next].report)
>> +   goto again;
>> +
>> +   napi_complete_done(napi, work);
>> +   }
>> +
>> +   return work;
>> +}
>> +
>> +static void nb8800_tx_dma_start(struct net_device *dev)
>> +{
>> +   struct nb8800_priv *priv = netdev_priv(dev);
>> +   struct nb8800_tx_buf *txb;
>> +   u32 txc_cr;
>> +
>> +   txb = >tx_bufs[priv->tx_queue];
>> +   if (!txb->ready)
>> +   return;
>> +
>> +   txc_cr = nb8800_readl(priv, NB8800_TXC_CR);
>> +   if (txc_cr & TCR_EN)
>> +   return;
>> +
>> +   nb8800_writel(priv, NB8800_TX_DESC_ADDR, txb->dma_desc);
>> +   wmb();  /* ensure desc addr is written before starting DMA */
>
> Hm… Have I missed corresponding rmb() ? If it's about MMIO, perhaps mmiowb() ?

Possibly.

>> +   nb8800_writel(priv, NB8800_TXC_CR, txc_cr | TCR_EN);
>> +
>> +   priv->tx_queue = (priv->tx_queue + txb->chain_len) % TX_DESC_COUNT;
>> +}

-- 
Måns Rullgård
m...@mansr.com
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net v2 3/3] packet: fix tpacket_snd max frame and vlan handling

2015-11-10 Thread Willem de Bruijn
On Tue, Nov 10, 2015 at 6:12 PM, Daniel Borkmann  wrote:
> On 11/10/2015 11:52 PM, Willem de Bruijn wrote:
>>>
>>>  if (sock->type == SOCK_DGRAM) {
>>> -   err = dev_hard_header(skb, dev, ntohs(proto), addr,
>>> -   NULL, tp_len);
>>> +   /* In DGRAM sockets, we expect struct sockaddr_ll was
>>> filled
>>> +* via struct msghdr, so we have dest mac and
>>> skb->protocol.
>>> +* Otherwise there's not too much useful things we can do
>>> in
>>> +* this flush run.
>>> +*/
>>> +   err = dev_hard_header(skb, dev, ntohs(skb->protocol),
>>> addr,
>>> + NULL, tp_len);
>>
>>
>> This change is not really necessary.
>
>
> Sure agreed, I found it helpful though. Don't mind removing it.
>
>>>  if (unlikely(err < 0))
>>>  return -EINVAL;
>>> -   } else if (dev->hard_header_len) {
>>
>>
>> Why remove the check on hard_header_len?
>
>
> Hmm, the patch doesn't remove the check (it's moved further below).
>
>>> -   if (ll_header_truncated(dev, tp_len))
>>> -   return -EINVAL;
>>> +   } else {
>>> +   /* If skb->protocol is still 0, try to infer/guess it.
>>> Might
>>> +* not be fully reliable in the sense that a user could
>>> still
>>> +* change/race data afterwards, but on the other hand the
>>> proto
>>
>>
>> The race goes away when probing it after the copy in skb_store_bits.
>> Then it is also certain that tp_len is long enough to hold the entire
>> link layer header.
>
>
> The skb_store_bits() is only done in case we do have a dev->hard_header_len
> or in case where we run into a possible situation where we have the
> additional
> 4 bytes on a full frame. In that case we need to check them properly, which
> requires copying, otherwise we don't copy any header.

I assumed that hard_header_len has to be non-zero if there
is a link layer header to probe. If we only intend to implement
probing in the case of Ethernet, then it certainly holds.

>
>>> +* can be set arbitrarily anyways. We only need to take
>>> care
>>> +* in case of extra large VLAN frames.
>>> +*/
>>> +   if (!skb->protocol && tp_len >= ETH_HLEN)
>>> +   skb->protocol = ((struct ethhdr *)data)->h_proto;
>>
>>
>> Packet sockets are not restricted to link layer of type Ethernet.
>>
>> There are a few other points in this file that also cast mac header
>> to eth_hdr(skb).
>
>
> Ok, the set doesn't address this assumption which we have elsewhere, too.
> Do you suggest to also check on dev->type for these cases?

Yes. If I'm right, then the other cases have to be fixed, too. One of the
eth_hdr(skb) calls was introduced by patch 09effa67a18d, where the
deleted code shows how it is safely restricted to ethernet:

  -   if (dev->type == ARPHRD_ETHER) {
  -   skb->protocol = eth_type_trans(skb, dev);
  -   if (skb->protocol == htons(ETH_P_8021Q))

>
> Thanks,
> Daniel
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net v2 3/3] packet: fix tpacket_snd max frame and vlan handling

2015-11-10 Thread Daniel Borkmann

On 11/11/2015 12:24 AM, Willem de Bruijn wrote:

On Tue, Nov 10, 2015 at 6:12 PM, Daniel Borkmann  wrote:

On 11/10/2015 11:52 PM, Willem de Bruijn wrote:


  if (sock->type == SOCK_DGRAM) {
-   err = dev_hard_header(skb, dev, ntohs(proto), addr,
-   NULL, tp_len);
+   /* In DGRAM sockets, we expect struct sockaddr_ll was
filled
+* via struct msghdr, so we have dest mac and
skb->protocol.
+* Otherwise there's not too much useful things we can do
in
+* this flush run.
+*/
+   err = dev_hard_header(skb, dev, ntohs(skb->protocol),
addr,
+ NULL, tp_len);



This change is not really necessary.



Sure agreed, I found it helpful though. Don't mind removing it.


  if (unlikely(err < 0))
  return -EINVAL;
-   } else if (dev->hard_header_len) {



Why remove the check on hard_header_len?



Hmm, the patch doesn't remove the check (it's moved further below).


-   if (ll_header_truncated(dev, tp_len))
-   return -EINVAL;
+   } else {
+   /* If skb->protocol is still 0, try to infer/guess it.
Might
+* not be fully reliable in the sense that a user could
still
+* change/race data afterwards, but on the other hand the
proto



The race goes away when probing it after the copy in skb_store_bits.
Then it is also certain that tp_len is long enough to hold the entire
link layer header.



The skb_store_bits() is only done in case we do have a dev->hard_header_len
or in case where we run into a possible situation where we have the
additional
4 bytes on a full frame. In that case we need to check them properly, which
requires copying, otherwise we don't copy any header.


I assumed that hard_header_len has to be non-zero if there
is a link layer header to probe. If we only intend to implement
probing in the case of Ethernet, then it certainly holds.


Yeah, I guess we only care about ether_setup() alike devices, so we'd have
ETH_HLEN room as dev->hard_header_len. That will hold, yes.


+* can be set arbitrarily anyways. We only need to take
care
+* in case of extra large VLAN frames.
+*/
+   if (!skb->protocol && tp_len >= ETH_HLEN)
+   skb->protocol = ((struct ethhdr *)data)->h_proto;



Packet sockets are not restricted to link layer of type Ethernet.

There are a few other points in this file that also cast mac header
to eth_hdr(skb).



Ok, the set doesn't address this assumption which we have elsewhere, too.
Do you suggest to also check on dev->type for these cases?


Yes. If I'm right, then the other cases have to be fixed, too. One of the
eth_hdr(skb) calls was introduced by patch 09effa67a18d, where the
deleted code shows how it is safely restricted to ethernet:

   -   if (dev->type == ARPHRD_ETHER) {
   -   skb->protocol = eth_type_trans(skb, dev);
   -   if (skb->protocol == htons(ETH_P_8021Q))


Saw that, so we need the check as one more fix for 57f89bfa2140 ("network:
Allow af_packet to transmit +4 bytes for VLAN packets.") as well.

I'll see to respin a v3 tomorrow.

Thanks,
Daniel
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH stable <= 3.18] net: add length argument to skb_copy_and_csum_datagram_iovec

2015-11-10 Thread Josh Hunt
On Thu, Oct 29, 2015 at 5:00 AM, Sabrina Dubroca  wrote:
> 2015-10-15, 14:25:03 +0200, Sabrina Dubroca wrote:
>> Without this length argument, we can read past the end of the iovec in
>> memcpy_toiovec because we have no way of knowing the total length of the
>> iovec's buffers.
>>
>> This is needed for stable kernels where 89c22d8c3b27 ("net: Fix skb
>> csum races when peeking") has been backported but that don't have the
>> ioviter conversion, which is almost all the stable trees <= 3.18.
>>
>> This also fixes a kernel crash for NFS servers when the client uses
>>  -onfsvers=3,proto=udp to mount the export.
>>
>> Signed-off-by: Sabrina Dubroca 
>> Reviewed-by: Hannes Frederic Sowa 
>
> Fixes CVE-2015-8019.
> http://www.openwall.com/lists/oss-security/2015/10/29/1
>
> --
> Sabrina
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Greg

Do you have this in your queue? I saw a few other stables pick this
up, but haven't seen it in 3.14 or 3.18 yet. It wasn't clear to me if
this had been fully reviewed yet.

Thanks
-- 
Josh
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] qed: select ZLIB_INFLATE

2015-11-10 Thread Arnd Bergmann
The newly added qlogic qed driver uses the zlib library, but
misses the dependency:

drivers/built-in.o: In function `qed_alloc_stream_mem':
drivers/net/ethernet/qlogic/qed/qed_main.c:707: undefined reference to 
`zlib_inflate_workspacesize'
drivers/built-in.o: In function `qed_unzip_data':
drivers/net/ethernet/qlogic/qed/qed_main.c:675: undefined reference to 
`zlib_inflateInit2'

This changes Kconfig to always select zlib when needed.

Signed-off-by: Arnd Bergmann 
Fixes: fe56b9e6a8d9 ("qed: Add module with basic common support")
---
Found on ARM randconfig builds

diff --git a/drivers/net/ethernet/qlogic/Kconfig 
b/drivers/net/ethernet/qlogic/Kconfig
index 30a6f246dfc9..ddcfcab034c2 100644
--- a/drivers/net/ethernet/qlogic/Kconfig
+++ b/drivers/net/ethernet/qlogic/Kconfig
@@ -94,6 +94,7 @@ config NETXEN_NIC
 config QED
tristate "QLogic QED 25/40/100Gb core driver"
depends on PCI
+   select ZLIB_INFLATE
---help---
  This enables the support for ...
 

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 5/6] net/faraday: Enable NCSI interface

2015-11-10 Thread Benjamin Herrenschmidt
On Tue, 2015-11-10 at 17:12 +1100, Gavin Shan wrote:
> 
> > So we require the interface to be opened to talk, so far so good,
> > the NC-SI stack doesn't even need to open it itself, it's acceptable
> > to require userspace to do it. IE. Userspace will chose what interface
> > to use, open it (for DHCP etc... or whatever other reason) and *that*
> > will then trigger the NC-SI negociation.
> > 
> 
> Yes, NCSI is smiliar to PHY to some extent. However, PHY's negotiation
> is purly electrical procedure, no packets received from MAC for it. We
> have the same situation when the NCSI/PHY is going to be brought down.

Right but we similarily only support PHY nego once the driver is open,
at least on most drivers. The only difference really is that we only
set the netif_carrier_on() when the PHY detects a link, while for NC-SI 
we currently need to keep it on always or we lose the queues, but that
can be looked at separately.

> At the beginning, the NCSI packets can be received and transmitted after
> the interface is opened.

Right.

>  Before NCSI negotiation is done, no other packets
> can be received and transmitted.

The interface doesn't care. We can transmit them, they just may not go
anywhere, its not our problem. Similarily, the companion NIC may or may
not forward incoming packets before the nego is complete, we don't
actually have to care or enforce anything here.

>  For the Rx path (for other packets), the
> NCSI link isn't enabled when NCSI negotiation isn't finished. There might
> have lots of egress packets whose IP addresses can't be resolved to MAC
> address as ARP resolution doesn't work before NCSI negotiation is done.

Correct, but is that a problem ? it's the same thing when we don't have
a link, though I suppose we have a faster path to drop them when the
carrier is down.

> So there is a weird window: interface is up, but no packets (except NCSI
> packets) can be received or transmitted.

Right but that's similar to what we used to do before we had
"intelligent" PHY control... our drivers didn't always know when the
link was up or even if there was a cable plugged.

I agree, it would be nicer to have the "Carrier" follow the
establishment of the NC-SI link, and we should look into fixing that
separately, possibly by establishing a special queue discipline rather
than noop when in that "limbo" mode, but that shouldn't be a blocker
for the patches and certainly doesn't require your driver change that
deals with interrupts while the interface is closed.

> When the interface is brought down, for example by "ifconfig eth0 down",
> The NCSI interface needs to be teared down by transmitting and receiving
> NCSI commands and responses.

That can be done synchronously from the close callback (With timeouts),
can't it ? If the core messes around with our state before close is
called, then we need to do something in the netdev core. However, it's
probably fine to just not do anything, worst case the companion NIC
will forward packets to a closed interface. Not a big deal and
definitely not a show stopper.

>  Similiarly, it introduces another weird window:
> interface is down, but NCSI packets still can be transmitted and received.

No, when interface is down, it's down. Nothing comes in and out, we
free the rings, rx skb's, interrupt, it's all gone. We even power down
the NIC in most cases.

Ben.

> > Cheers,
> > Ben.
> > 
> 
> Thanks,
> Gavin
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: network stream fairness

2015-11-10 Thread Niklas Cassel
On 11/09/2015 06:23 PM, Eric Dumazet wrote:
> On Mon, 2015-11-09 at 17:50 +0100, Niklas Cassel wrote:
> 
>>
>> for i in `seq 1 20`; do ss -temoi dst 192.168.0.141; sleep 1; done
> ...
>> ESTAB  0  0192.168.0.1:54578 
>>   192.168.0.141:5201 rto:0.2 ato:0.04 cwnd:10 uid:20283 
>> ino:7901838 sk:880143735040
>> State  Recv-Q Send-Q Local Address:Port  
>>Peer Address:Port   
>> ESTAB  0  0192.168.0.1:54576 
>>   192.168.0.141:5201 rto:0.2 ato:0.04 cwnd:10 uid:20283 
>> ino:7901837 sk:880143735800
>> ESTAB  0  0192.168.0.1:54574 
>>   192.168.0.141:5201 rto:0.2 ato:0.04 cwnd:10 qack:10 
>> uid:20283 ino:7902591 sk:880223311800
>> ESTAB  1448   0192.168.0.1:54578 
>>   192.168.0.141:5201 rto:0.2 ato:0.04 cwnd:10 uid:20283 
>> ino:7901838 sk:880143735040
> 
> This is the receiver, I would like to see the sender side, since the
> changes we talk about are at his side, of course.

Sorry for that.

iperf3 sample. most samples were equal to this.
[  4]  10.00-11.00  sec  3.75 MBytes  31.5 Mbits/sec
[  6]  10.00-11.00  sec  7.05 MBytes  59.2 Mbits/sec
[SUM]  10.00-11.00  sec  10.8 MBytes  90.7 Mbits/sec


# for i in `seq 1 20`; do ./ss -temoi dst 192.168.0.1; sleep 1; done
State  Recv-Q Send-Q Local Address:Port Peer Address:Port   
 
State  Recv-Q Send-Q Local Address:Port Peer Address:Port   
 
ESTAB  0  0   :::192.168.0.143:5201
:::192.168.0.1:55404 ino:17349 sk:4 <->
 skmem:(r0,rb328320,t0,tb44800,f20480,w0,o0,bl0) ts sack cubic 
wscale:7,5 rto:200 ato:40 mss:1448 cwnd:10 bytes_received:37 segs_out:1 
segs_in:1 rcv_space:28960
ESTAB  0  1   :::192.168.0.143:5201
:::192.168.0.1:55402 timer:(on,210ms,0) ino:18856 sk:5 <->
 skmem:(r0,rb328320,t0,tb44800,f1856,w2240,o0,bl0) ts sack cubic 
wscale:7,5 rto:210 rtt:0.03/0.06 ato:40 mss:1448 cwnd:10 bytes_acked:1 
bytes_received:143 segs_out:5 segs_in:4 send 3861.3Mbps lastack:40 pacing_rate 
7690.6Mbps unacked:1 rcv_space:28960
ESTAB  37 0   :::192.168.0.143:5201
:::192.168.0.1:55406 ino:0 sk:6 <->
 skmem:(r16576,rb328320,t0,tb44800,f3904,w0,o0,bl0) ts sack cubic 
wscale:7,5 rto:200 ato:40 mss:1448 cwnd:10 bytes_received:37 segs_out:1 
segs_in:1 rcv_space:28960
State  Recv-Q Send-Q Local Address:Port Peer Address:Port   
 
ESTAB  0  159280  :::192.168.0.143:5201
:::192.168.0.1:55404 timer:(on,210ms,0) ino:17349 sk:4 <->
 skmem:(r0,rb328320,t27472,tb192640,f120048,w162576,o0,bl0) ts sack 
cubic wscale:7,5 rto:210 rtt:5.326/0.789 ato:40 mss:1448 cwnd:43 ssthresh:16 
bytes_acked:4500384 bytes_received:37 segs_out:3126 segs_in:1586 send 93.5Mbps 
lastsnd:10 lastrcv:1040 pacing_rate 112.2Mbps unacked:17 rcv_space:28960
ESTAB  0  0   :::192.168.0.143:5201
:::192.168.0.1:55402 ino:18856 sk:5 <->
 skmem:(r0,rb328320,t0,tb44800,f0,w0,o0,bl0) ts sack cubic wscale:7,5 
rto:210 rtt:3.738/6.898 ato:40 mss:1448 cwnd:10 bytes_acked:4 
bytes_received:143 segs_out:7 segs_in:7 send 31.0Mbps lastsnd:940 lastrcv:1040 
lastack:940 pacing_rate 62.0Mbps rcv_space:28960
ESTAB  0  273672  :::192.168.0.143:5201
:::192.168.0.1:55406 timer:(on,200ms,0) ino:17351 sk:6 <->
 skmem:(r0,rb328320,t55576,tb277760,f44024,w279560,o0,bl0) ts sack 
cubic wscale:7,5 rto:210 rtt:4.578/0.644 ato:40 mss:1448 cwnd:62 ssthresh:19 
bytes_acked:6114904 bytes_received:37 segs_out:4261 segs_in:2147 send 156.9Mbps 
lastsnd:10 lastrcv:1040 lastack:10 pacing_rate 188.2Mbps unacked:37 
rcv_space:28960
State  Recv-Q Send-Q Local Address:Port Peer Address:Port   
 
ESTAB  0  136112  :::192.168.0.143:5201
:::192.168.0.1:55404 timer:(on,210ms,0) ino:17349 sk:4 <->
 skmem:(r0,rb328320,t15776,tb201600,f142512,w140112,o0,bl0) ts sack 
cubic wscale:7,5 rto:210 rtt:6.518/1.397 ato:40 mss:1448 cwnd:45 ssthresh:16 
bytes_acked:8954432 bytes_received:37 segs_out:6196 segs_in:3124 send 80.0Mbps 
lastsnd:10 lastrcv:2090 pacing_rate 96.0Mbps unacked:11 rcv_space:28960
ESTAB  0  0   :::192.168.0.143:5201
:::192.168.0.1:55402 ino:18856 sk:5 <->
 skmem:(r0,rb328320,t0,tb44800,f0,w0,o0,bl0) ts sack cubic wscale:7,5 
rto:210 rtt:3.738/6.898 ato:40 mss:1448 cwnd:10 bytes_acked:4 

RE: [PATCH v8] can: xilinx: Convert to runtime_pm

2015-11-10 Thread Appana Durga Kedareswara Rao
Ping!!

> -Original Message-
> From: Kedareswara rao Appana [mailto:appana.durga@xilinx.com]
> Sent: Monday, October 26, 2015 11:42 AM
> To: Anirudha Sarangi; w...@grandegger.com; m...@pengutronix.de; Michal
> Simek; Soren Brinkmann
> Cc: linux-...@vger.kernel.org; netdev@vger.kernel.org; linux-arm-
> ker...@lists.infradead.org; linux-ker...@vger.kernel.org; Appana Durga
> Kedareswara Rao
> Subject: [PATCH v8] can: xilinx: Convert to runtime_pm
> 
> Instead of enabling/disabling clocks at several locations in the driver,
> Use the runtime_pm framework. This consolidates the actions for runtime PM
> In the appropriate callbacks and makes the driver more readable and
> mantainable.
> 
> Signed-off-by: Kedareswara rao Appana 
> ---
> Changes for v8:
>   - Remove pm_runtime_irq_safe() API call from the probe as
> clk_prepare_enable
> Call canbe called from the atomic context as suggested by Marc.
> Changes for v7:
>   - Removed the unnecessary clk_prepare/clk_unprepare calls
> From  the probe and remove as suggested by Soren.
> Changes for v6:
>  - Updated the driver with review comments as suggested by Marc.
> Changes for v5:
>  - Updated with the review comments.
>Updated the remove fuction to use runtime_pm.
> Chnages for v4:
>  - Updated with the review comments.
> Changes for v3:
>   - Converted the driver to use runtime_pm.
> Changes for v2:
>   - Removed the struct platform_device* from suspend/resume
> as suggest by Lothar
> 
>  drivers/net/can/xilinx_can.c | 176 +-
> -
>  1 file changed, 101 insertions(+), 75 deletions(-)
> 
> diff --git a/drivers/net/can/xilinx_can.c b/drivers/net/can/xilinx_can.c
> index fc55e8e..ad38065 100644
> --- a/drivers/net/can/xilinx_can.c
> +++ b/drivers/net/can/xilinx_can.c
> @@ -32,6 +32,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
> 
>  #define DRIVER_NAME  "xilinx_can"
> 
> @@ -138,7 +139,7 @@ struct xcan_priv {
>   u32 (*read_reg)(const struct xcan_priv *priv, enum xcan_reg reg);
>   void (*write_reg)(const struct xcan_priv *priv, enum xcan_reg reg,
>   u32 val);
> - struct net_device *dev;
> + struct device *dev;
>   void __iomem *reg_base;
>   unsigned long irq_flags;
>   struct clk *bus_clk;
> @@ -843,6 +844,13 @@ static int xcan_open(struct net_device *ndev)
>   struct xcan_priv *priv = netdev_priv(ndev);
>   int ret;
> 
> + ret = pm_runtime_get_sync(priv->dev);
> + if (ret < 0) {
> + netdev_err(ndev, "%s: pm_runtime_get failed(%d)\n",
> + __func__, ret);
> + return ret;
> + }
> +
>   ret = request_irq(ndev->irq, xcan_interrupt, priv->irq_flags,
>   ndev->name, ndev);
>   if (ret < 0) {
> @@ -850,29 +858,17 @@ static int xcan_open(struct net_device *ndev)
>   goto err;
>   }
> 
> - ret = clk_prepare_enable(priv->can_clk);
> - if (ret) {
> - netdev_err(ndev, "unable to enable device clock\n");
> - goto err_irq;
> - }
> -
> - ret = clk_prepare_enable(priv->bus_clk);
> - if (ret) {
> - netdev_err(ndev, "unable to enable bus clock\n");
> - goto err_can_clk;
> - }
> -
>   /* Set chip into reset mode */
>   ret = set_reset_mode(ndev);
>   if (ret < 0) {
>   netdev_err(ndev, "mode resetting failed!\n");
> - goto err_bus_clk;
> + goto err_irq;
>   }
> 
>   /* Common open */
>   ret = open_candev(ndev);
>   if (ret)
> - goto err_bus_clk;
> + goto err_irq;
> 
>   ret = xcan_chip_start(ndev);
>   if (ret < 0) {
> @@ -888,13 +884,11 @@ static int xcan_open(struct net_device *ndev)
> 
>  err_candev:
>   close_candev(ndev);
> -err_bus_clk:
> - clk_disable_unprepare(priv->bus_clk);
> -err_can_clk:
> - clk_disable_unprepare(priv->can_clk);
>  err_irq:
>   free_irq(ndev->irq, ndev);
>  err:
> + pm_runtime_put(priv->dev);
> +
>   return ret;
>  }
> 
> @@ -911,12 +905,11 @@ static int xcan_close(struct net_device *ndev)
>   netif_stop_queue(ndev);
>   napi_disable(>napi);
>   xcan_chip_stop(ndev);
> - clk_disable_unprepare(priv->bus_clk);
> - clk_disable_unprepare(priv->can_clk);
>   free_irq(ndev->irq, ndev);
>   close_candev(ndev);
> 
>   can_led_event(ndev, CAN_LED_EVENT_STOP);
> + pm_runtime_put(priv->dev);
> 
>   return 0;
>  }
> @@ -935,27 +928,20 @@ static int xcan_get_berr_counter(const struct
> net_device *ndev,
>   struct xcan_priv *priv = netdev_priv(ndev);
>   int ret;
> 
> - ret = clk_prepare_enable(priv->can_clk);
> - if (ret)
> - goto err;
> -
> - ret = clk_prepare_enable(priv->bus_clk);
> - if (ret)
> - goto err_clk;
> + ret = pm_runtime_get_sync(priv->dev);
> + if (ret < 0) {
> + netdev_err(ndev, "%s: 

[PATCH] netfilter: nfnetlink_log: work around uninitialized variable warning

2015-11-10 Thread Arnd Bergmann
After a recent (correct) change, gcc started warning about the use
of the 'flags' variable in nfulnl_recv_config()

net/netfilter/nfnetlink_log.c: In function 'nfulnl_recv_config':
net/netfilter/nfnetlink_log.c:320:14: warning: 'flags' may be used 
uninitialized in this function [-Wmaybe-uninitialized]
net/netfilter/nfnetlink_log.c:828:6: note: 'flags' was declared here

The warning first shows up in ARM s3c2410_defconfig with gcc-4.3 or
higher (including 5.2.1, which is the latest version I checked) I
tried working around it by rearranging the code but had no success
with that.

As a last resort, this initializes the variable to zero, which shuts
up the warning, but means that we don't get a warning if the code
is ever changed in a way that actually causes the variable to be
used without first being written.

Signed-off-by: Arnd Bergmann 
Fixes: 8cbc870829ec ("netfilter: nfnetlink_log: validate dependencies to avoid 
breaking atomicity")
---

diff --git a/net/netfilter/nfnetlink_log.c b/net/netfilter/nfnetlink_log.c
index 06eb48fceb42..740cce4685ac 100644
--- a/net/netfilter/nfnetlink_log.c
+++ b/net/netfilter/nfnetlink_log.c
@@ -825,7 +825,7 @@ nfulnl_recv_config(struct sock *ctnl, struct sk_buff *skb,
struct net *net = sock_net(ctnl);
struct nfnl_log_net *log = nfnl_log_pernet(net);
int ret = 0;
-   u16 flags;
+   u16 flags = 0;
 
if (nfula[NFULA_CFG_CMD]) {
u_int8_t pf = nfmsg->nfgen_family;

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[iproute PATCH] iproute: restrict hoplimit values to be in range [1;255]

2015-11-10 Thread Phil Sutter
Technically, the range of possible hoplimit values are defined by IPv4
and IPv6 header formats. Both define the field to be eight bits in size,
which leads to a value range of [0;255]. Setting a packet's hoplimit
field to 0 though makes not much sense, as the next hop would
immediately drop the packet. Therefore Linux uses 0 as a special value
indicating to use the system's default hoplimit (configurable via
sysctl). In iproute, setting the hoplimit of a route to 0 is equivalent
to omitting the hoplimit parameter alltogether, so it is not necessary
to allow that value to be specified.

Signed-off-by: Phil Sutter 
---
 ip/iproute.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/ip/iproute.c b/ip/iproute.c
index c0ef7bf..e0c8e4c 100644
--- a/ip/iproute.c
+++ b/ip/iproute.c
@@ -931,7 +931,8 @@ static int iproute_modify(int cmd, unsigned flags, int 
argc, char **argv)
mxlock |= (1< 255)
invarg("\"hoplimit\" value is invalid\n", 
*argv);
rta_addattr32(mxrta, sizeof(mxbuf), RTAX_HOPLIMIT, 
hoplimit);
} else if (strcmp(*argv, "advmss") == 0) {
-- 
2.1.2

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH, REPORT] bpf_trace: build error without PERF_EVENTS

2015-11-10 Thread Arnd Bergmann
In my ARM randconfig tests, I'm getting a build error for
newly added code in bpf_perf_event_read and bpf_perf_event_output
whenever CONFIG_PERF_EVENTS is disabled:

kernel/trace/bpf_trace.c: In function 'bpf_perf_event_read':
kernel/trace/bpf_trace.c:203:11: error: 'struct perf_event' has no member named 
'oncpu'
if (event->oncpu != smp_processor_id() ||
 ^
kernel/trace/bpf_trace.c:204:11: error: 'struct perf_event' has no member named 
'pmu'
  event->pmu->count)

This can happen when UPROBE_EVENT is enabled but KPROBE_EVENT
is disabled. I'm not sure if that is a configuration we care
about, otherwise we could prevent this case from occuring by
adding Kconfig dependencies.

Simply hiding the broken code inside #ifdef CONFIG_PERF_EVENTS
as this patch does seems to reliably fix the error as well,
I have built thousands of randconfig kernels since I started
seeing this and added the workaround.

Signed-off-by: Arnd Bergmann 
Fixes: 62544ce8e01c ("bpf: fix bpf_perf_event_read() helper")
Fixes: a43eec304259 ("bpf: introduce bpf_perf_event_output() helper")
---
I suspect my patch is not the right answer, but could someone please
fix this?

diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 4228fd3682c3..82e0bc9d002a 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -186,6 +186,7 @@ const struct bpf_func_proto 
*bpf_get_trace_printk_proto(void)
return _trace_printk_proto;
 }
 
+#if IS_ENABLED(CONFIG_PERF_EVENTS)
 static u64 bpf_perf_event_read(u64 r1, u64 index, u64 r3, u64 r4, u64 r5)
 {
struct bpf_map *map = (struct bpf_map *) (unsigned long) r1;
@@ -263,6 +264,7 @@ static const struct bpf_func_proto 
bpf_perf_event_output_proto = {
.arg4_type  = ARG_PTR_TO_STACK,
.arg5_type  = ARG_CONST_STACK_SIZE,
 };
+#endif
 
 static const struct bpf_func_proto *kprobe_prog_func_proto(enum bpf_func_id 
func_id)
 {
@@ -289,10 +291,12 @@ static const struct bpf_func_proto 
*kprobe_prog_func_proto(enum bpf_func_id func
return bpf_get_trace_printk_proto();
case BPF_FUNC_get_smp_processor_id:
return _get_smp_processor_id_proto;
+#if IS_ENABLED(CONFIG_PERF_EVENTS)
case BPF_FUNC_perf_event_read:
return _perf_event_read_proto;
case BPF_FUNC_perf_event_output:
return _perf_event_output_proto;
+#endif
default:
return NULL;
}

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5] net: ethernet: add driver for Aurora VLSI NB8800 Ethernet controller

2015-11-10 Thread Andy Shevchenko
On Wed, Nov 11, 2015 at 1:07 AM, Måns Rullgård  wrote:
> Andy Shevchenko  writes:

> +   nb8800_writel(priv, NB8800_TX_DESC_ADDR, txb->dma_desc);
> +   wmb();  /* ensure desc addr is written before starting 
> DMA */

 Hm… Have I missed corresponding rmb() ? If it's about MMIO, perhaps 
 mmiowb() ?
>>>
>>> Possibly.
>>
>> Standalone wmb() doesn't make sense.
>
> It does if you need to enforce ordering between normal and I/O memory.
> In fact, since the descriptor is filled in using normal memory accesses,
> my understanding is that mmiowb() would be insufficient here.  The
> comment could be improved, however.

Can you then explain what exactly you are assured against in all cases
where you are using wmb()s? It seems I don't recognize this part in
some excerpts.


-- 
With Best Regards,
Andy Shevchenko
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5] net: ethernet: add driver for Aurora VLSI NB8800 Ethernet controller

2015-11-10 Thread David Miller
From: Måns Rullgård 
Date: Wed, 11 Nov 2015 00:40:09 +

> When the DMA complete interrupt arrives, the next chain should be
> kicked off as quickly as possible, and I don't see why that would
> benefit from being done in napi context.

NAPI isn't about low latency, it's about fairness and interrupt
mitigation.

You probably don't even realize that all of the TX SKB freeing you do
in the hardware interrupt handler end up being actually processed by a
scheduled software interrupt anyways.

So you are gaining almost nothing by not doing TX completion in NAPI
context, whereas by doing so you would be gaining a lot including
more simplified locking or even the ability to do no locking at all.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] drivers: net: cpsw: add support for fixed-links.

2015-11-10 Thread Daniel Trautmann
On Mon, Nov 09, 2015 at 10:49:51PM -0500, David Miller wrote:
> From: Daniel Trautmann 
> Date: Mon, 9 Nov 2015 20:24:14 +0100
> 
> > Add support for fixed-links in configurations without PHY.
> > (e.g. connection to a switch, SGMII point to point, SFPs)
> > 
> > Check: Documentation/devicetree/bindings/net/fixed-link.txt.
> > 
> > Signed-off-by: Daniel Trautmann 
> 
> This patch doesn't apply cleanly at all.

Sorry, I was working on the wrong branch.
I found there is a similar patch already in netdev-next, I will
check this an submit a new Patch if necessary.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Revert "bridge: Allow forward delay to be cfgd when STP enabled"

2015-11-10 Thread Vladislav Yasevich
This reverts commit 34c2d9fb0498c066afbe610b15e18995fd8be792.

There are 2 reasons for this revert:
 1)  The commit in question doesn't do what it says it does.  The
 description reads: "Allow bridge forward delay to be configured
 when Spanning Tree is enabled."  This was already the case before
 the commit was made.  What the commit actually do was disallow
 invalid values or 'forward_delay' when STP was turned off.

 2)  The above change was actually a change in the user observed
 behavior and broke things like libvirt and other network configs
 that set 'forward_delay' to 0 without enabling STP.  The value
 of 0 is actually used when STP is turned off to immediately mark
 the bridge as forwarding.

Signed-off-by: Vlad Yasevich 
---
 net/bridge/br_stp.c | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/net/bridge/br_stp.c b/net/bridge/br_stp.c
index 80c34d7..f7e8dee 100644
--- a/net/bridge/br_stp.c
+++ b/net/bridge/br_stp.c
@@ -600,12 +600,17 @@ void __br_set_forward_delay(struct net_bridge *br, 
unsigned long t)
 int br_set_forward_delay(struct net_bridge *br, unsigned long val)
 {
unsigned long t = clock_t_to_jiffies(val);
-
-   if (t < BR_MIN_FORWARD_DELAY || t > BR_MAX_FORWARD_DELAY)
-   return -ERANGE;
+   int err = -ERANGE;
 
spin_lock_bh(>lock);
+   if (br->stp_enabled != BR_NO_STP &&
+   (t < BR_MIN_FORWARD_DELAY || t > BR_MAX_FORWARD_DELAY))
+   goto unlock;
+
__br_set_forward_delay(br, t);
+   err = 0;
+
+unlock:
spin_unlock_bh(>lock);
-   return 0;
+   return err;
 }
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [iproute PATCH v2 4/6] ipaddress: fix ipaddr_flush for Linux >= 3.1

2015-11-10 Thread Phil Sutter
On Mon, Nov 09, 2015 at 09:58:50PM +0300, Sergei Shtylyov wrote:
> On 11/09/2015 09:51 PM, Sergei Shtylyov wrote:
> 
> >> Linux version 3.1 introduced a consistency check for netlink dumps in
> >> commit 670dc28 ("netlink: advertise incomplete dumps"). This bites
> >
> > The scripts/checkpatch.pl now enforces 12-digit commit ID...
> 
> Sorry, didn't realize it wasn't a kernel patch. :-)

Heh, no problem! If more projects used checkpatch.pl, the world would be
a better place.

My quoting commits was usually a matter of copying what 'git --oneline
show' outputs. Thanks for the heads-up, I've adjusted core.abbrev now.

Cheers, Phil
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] arm64: bpf: add BPF XADD instruction

2015-11-10 Thread Alexei Starovoitov
On Tue, Nov 10, 2015 at 04:26:02PM -0800, Shi, Yang wrote:
> On 11/10/2015 4:08 PM, Eric Dumazet wrote:
> >On Tue, 2015-11-10 at 14:41 -0800, Yang Shi wrote:
> >>aarch64 doesn't have native support for XADD instruction, implement it by
> >>the below instruction sequence:
> >>
> >>Load (dst + off) to a register
> >>Add src to it
> >>Store it back to (dst + off)
> >
> >Not really what is needed ?
> >
> >See this BPF_XADD as an atomic_add() equivalent.
> 
> I see. Thanks. The documentation doesn't say too much about "exclusive" add.
> If so it should need load-acquire/store-release.

I think doc is clear enough, but it can always be improved. Pls suggest a patch.
It's quite hard to write a test for atomicity in test_bpf framework, so
code review is the key. Eric, thanks for catching it!

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5] net: ethernet: add driver for Aurora VLSI NB8800 Ethernet controller

2015-11-10 Thread Måns Rullgård
Francois Romieu  writes:

> Mans Rullgard  :
>> diff --git a/drivers/net/ethernet/aurora/nb8800.c 
>> b/drivers/net/ethernet/aurora/nb8800.c
>> new file mode 100644
>> index 000..11cd389
>> --- /dev/null
>> +++ b/drivers/net/ethernet/aurora/nb8800.c
> [...]
>> +static int nb8800_xmit(struct sk_buff *skb, struct net_device *dev)
>> +{
> [...]
>> +
>> +netdev_sent_queue(dev, skb->len);
>> +
>> +smp_mb__before_spinlock();
>> +spin_lock_irqsave(>tx_lock, flags);
>
> At some point you may consider performing both Tx and Rx from napi context
> and thus replacing priv->tx_lock with netif_tx_lock.

That lock is to synchronise the DMA start between nb8800_xmit() and the
interrupt handler.  When the DMA complete interrupt arrives, the next
chain should be kicked off as quickly as possible, and I don't see why
that would benefit from being done in napi context.

>> +
>> +if (!skb->xmit_more) {
>> +priv->tx_chain->ready = true;
>> +priv->tx_chain = NULL;
>> +nb8800_tx_dma_start(dev);
>> +}
>> +
>> +spin_unlock_irqrestore(>tx_lock, flags);
>> +
>> +priv->tx_next = next;
>
> Are there strong reasons why nb8800_tx_done could not kick between
> spin_unlock_irqrestore and the non-local update of priv->tx_next ?

Good catch.  priv->tx_next wasn't accessed elsewhere in an earlier
version, and I forgot to fix that.  nb8800_tx_done() makes sure the DMA
has actually finished, so priv->tx_next should be updated before
starting the DMA rather than after.  The check against tx_next in
nb8800_tx_done() is only to put some limit on the loop and to avoid
confusion when nb8800_dma_stop() does it's dance.  There should be no
need for more synchronisation here than what the already present memory
barriers provide.

> [...]
>> +static irqreturn_t nb8800_irq(int irq, void *dev_id)
>> +{
>> +struct net_device *dev = dev_id;
>> +struct nb8800_priv *priv = netdev_priv(dev);
>> +u32 val;
>> +
>> +/* tx interrupt */
>> +val = nb8800_readl(priv, NB8800_TXC_SR);
>> +if (val) {
> [...]
>> +}
>> +
>> +/* rx interrupt */
>> +val = nb8800_readl(priv, NB8800_RXC_SR);
>> +if (val) {
> [...]
>> +}
>> +
>> +return IRQ_HANDLED;
>
> Returning IRQ_HANDLED is fine if one of those hold:
> 1. you're sure that at least one of the "if" branch is used
> 2. you'll be able to quickly figure out what's happening whenever 1. stops
>being true.

You're right, better to check that the device really did have something
to say.

-- 
Måns Rullgård
m...@mansr.com
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] net: Add SO_REUSEPORT_LISTEN_OFF socket option as drain mode

2015-11-10 Thread Eric Dumazet
On Tue, 2015-11-10 at 21:41 -0800, Tom Herbert wrote:
> Tolga, are you still planning to respin this patch (when tree opens?)

I was planning to add an union on skc_tx_queue_mapping and
sk_max_ack_backlog, so that adding a check on sk_max_ack_backlog in
listener lookup would not add an additional cache line miss.

This would remove false sharing because sk_ack_backlog is often dirtied
when a socket is added into accept queue.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net] af-unix: fix use-after-free with concurrent readers while splicing

2015-11-10 Thread Hannes Frederic Sowa
During splicing an af-unix socket to a pipe we have to drop all
af-unix socket locks. While doing so we allow another reader to enter
unix_stream_read_generic which can read, copy and finally free another
skb. If exactly this skb is just in process of being spliced we get a
use-after-free report by kasan.

First, we must make sure to not have a free while the skb is used during
the splice operation. We simply increment its use counter before unlocking
the reader lock.

Stream sockets have the nice characteristic that we don't care about
zero length writes and they never reach the peer socket's queue. That
said, we can take the UNIXCB.consumed field as the indicator if the
skb was already freed from the socket's receive queue. If the skb was
fully consumed after we locked the reader side again we know it has been
dropped by a second reader. We indicate a short read to user space and
abort the current splice operation.

This bug has been found with syzkaller
(http://github.com/google/syzkaller) by Dmitry Vyukov.

Reported-by: Dmitry Vyukov 
Cc: Dmitry Vyukov 
Cc: Eric Dumazet 
Signed-off-by: Hannes Frederic Sowa 
---
 net/unix/af_unix.c | 18 +-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index aaa0b58..b2c4131 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -441,6 +441,7 @@ static void unix_release_sock(struct sock *sk, int embrion)
if (state == TCP_LISTEN)
unix_release_sock(skb->sk, 1);
/* passed fds are erased in the kfree_skb hook*/
+   UNIXCB(skb).consumed = skb->len;
kfree_skb(skb);
}
 
@@ -2152,7 +2153,7 @@ unlock:
}
 
chunk = min_t(unsigned int, unix_skb_len(skb) - skip, size);
-   chunk = state->recv_actor(skb, skip, chunk, state);
+   chunk = state->recv_actor(skb_get(skb), skip, chunk, state);
if (chunk < 0) {
if (copied == 0)
copied = -EFAULT;
@@ -2161,6 +2162,21 @@ unlock:
copied += chunk;
size -= chunk;
 
+   if (!unix_skb_len(skb)) {
+   /* the skb was touched by a concurrent reader;
+* we should not expect anything from this skb
+* anymore and assume it invalid - we can be
+* sure it was dropped from the socket queue
+*
+* let's report a short read
+*/
+   consume_skb(skb);
+   err = 0;
+   break;
+   }
+
+   consume_skb(skb);
+
/* Mark read part of skb as used */
if (!(flags & MSG_PEEK)) {
UNIXCB(skb).consumed += chunk;
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH, REPORT] bpf_trace: build error without PERF_EVENTS

2015-11-10 Thread Steven Rostedt
On Tue, 10 Nov 2015 14:31:38 +0100
Daniel Borkmann  wrote:

> On 11/10/2015 01:55 PM, Arnd Bergmann wrote:
> > In my ARM randconfig tests, I'm getting a build error for
> > newly added code in bpf_perf_event_read and bpf_perf_event_output
> > whenever CONFIG_PERF_EVENTS is disabled:
> >
> > kernel/trace/bpf_trace.c: In function 'bpf_perf_event_read':
> > kernel/trace/bpf_trace.c:203:11: error: 'struct perf_event' has no member 
> > named 'oncpu'
> > if (event->oncpu != smp_processor_id() ||
> >   ^
> > kernel/trace/bpf_trace.c:204:11: error: 'struct perf_event' has no member 
> > named 'pmu'
> >event->pmu->count)
> >
> > This can happen when UPROBE_EVENT is enabled but KPROBE_EVENT
> > is disabled. I'm not sure if that is a configuration we care
> > about, otherwise we could prevent this case from occuring by
> > adding Kconfig dependencies.  
> 
> I think that seems better than spreading #if IS_ENABLEDs into the code.
> Probably enough to add a 'depends on PERF_EVENTS' to config BPF_EVENTS,
> so it's also explicitly documented.
> 

So just do the following then?

-- Steve

diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index 8d6363f42169..f5aecff2d243 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -434,7 +434,7 @@ config UPROBE_EVENT
 
 config BPF_EVENTS
depends on BPF_SYSCALL
-   depends on KPROBE_EVENT || UPROBE_EVENT
+   depends on KPROBE_EVENT && UPROBE_EVENT
bool
default y
help
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/6] net: dsa: mv88e6060: add register defines header file

2015-11-10 Thread Andrew Lunn
On Tue, Nov 10, 2015 at 09:25:51AM -0500, Vivien Didelot wrote:
> Hi Neil,
> 
> On Nov. Tuesday 10 (46) 02:25 PM, Neil Armstrong wrote:
> > To align with the mv88e6xxx code, add a similar header file
> > with all the register defines.
> > The file is based on the mv88e6xxx header for coherency.
> > 
> > Signed-off-by: Neil Armstrong 
> 
> In the RFC patchset, Andrew mentioned that there is not that much things in
> common with mv88e6xxx, so I don't really see a value to add a separate header
> file. Would that make sense to you guys to add the defines directly in
> mv88e6060.c and squash that in the last patch?

It is personal taste, but i think there are enough defines that having
a separate header file is useful. For < 10 i would agree with Vivien,
but with ~100, i prefer a header file. 

  Andrew
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net] af-unix: fix use-after-free with concurrent readers while splicing

2015-11-10 Thread Hannes Frederic Sowa
On Tue, Nov 10, 2015, at 15:26, Hannes Frederic Sowa wrote:
>  net/unix/af_unix.c | 18 +-
>  1 file changed, 17 insertions(+), 1 deletion(-)
> 
> diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
> index aaa0b58..b2c4131 100644
> --- a/net/unix/af_unix.c
> +++ b/net/unix/af_unix.c
> @@ -441,6 +441,7 @@ static void unix_release_sock(struct sock *sk, int
> embrion)
>   if (state == TCP_LISTEN)
>   unix_release_sock(skb->sk, 1);
>   /* passed fds are erased in the kfree_skb hook*/
> +   UNIXCB(skb).consumed = skb->len;
>   kfree_skb(skb);
>   }
>  
> @@ -2152,7 +2153,7 @@ unlock:
>   }
>  
>   chunk = min_t(unsigned int, unix_skb_len(skb) - skip, size);
> -   chunk = state->recv_actor(skb, skip, chunk, state);
> +   chunk = state->recv_actor(skb_get(skb), skip, chunk,
> state);
>   if (chunk < 0) {
>   if (copied == 0)
>   copied = -EFAULT;

I forgot a consume_skb here. Will send v2. Seems like I can review
patches much better in a mail client. :}

Bye,
Hannes
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/6] net: dsa: mv88e6060: add register defines header file

2015-11-10 Thread Vivien Didelot
On Nov. Tuesday 10 (46) 03:30 PM, Andrew Lunn wrote:
> On Tue, Nov 10, 2015 at 09:25:51AM -0500, Vivien Didelot wrote:
> > Hi Neil,
> > 
> > On Nov. Tuesday 10 (46) 02:25 PM, Neil Armstrong wrote:
> > > To align with the mv88e6xxx code, add a similar header file
> > > with all the register defines.
> > > The file is based on the mv88e6xxx header for coherency.
> > > 
> > > Signed-off-by: Neil Armstrong 
> > 
> > In the RFC patchset, Andrew mentioned that there is not that much things in
> > common with mv88e6xxx, so I don't really see a value to add a separate 
> > header
> > file. Would that make sense to you guys to add the defines directly in
> > mv88e6060.c and squash that in the last patch?
> 
> It is personal taste, but i think there are enough defines that having
> a separate header file is useful. For < 10 i would agree with Vivien,
> but with ~100, i prefer a header file. 

OK. So please fix the copyright owner of this new file then and we're good :-)

Thanks,
-v
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/6] net: dsa: mv88e6060: add register defines header file

2015-11-10 Thread Neil Armstrong
On 11/10/2015 03:25 PM, Vivien Didelot wrote:
> Hi Neil,
> 
> On Nov. Tuesday 10 (46) 02:25 PM, Neil Armstrong wrote:
>> To align with the mv88e6xxx code, add a similar header file
>> with all the register defines.
>> The file is based on the mv88e6xxx header for coherency.
>>
>> Signed-off-by: Neil Armstrong 
> 
> In the RFC patchset, Andrew mentioned that there is not that much things in
> common with mv88e6xxx, so I don't really see a value to add a separate header
> file. Would that make sense to you guys to add the defines directly in
> mv88e6060.c and squash that in the last patch?
> 
>> ---
>>  drivers/net/dsa/mv88e6060.h | 108 
>> 
>>  1 file changed, 108 insertions(+)
>>  create mode 100644 drivers/net/dsa/mv88e6060.h
>>
>> diff --git a/drivers/net/dsa/mv88e6060.h b/drivers/net/dsa/mv88e6060.h
>> new file mode 100644
>> index 000..adbc894
>> --- /dev/null
>> +++ b/drivers/net/dsa/mv88e6060.h
>> @@ -0,0 +1,108 @@
>> +/*
>> + * net/dsa/mv88e6060.h - Marvell 88e6060 switch chip support
>> + * Copyright (c) 2008 Marvell Semiconductor
> 
> Also I don't think the copyright notice is correct here.
> 
> Thanks,
> -v
> 
Vivien,

Is something like this OK ?
/*
 * drivers/net/dsa/mv88e6060.h - Marvell 88e6060 switch chip support
 * Copyright (c) 2015 Neil Armstrong
 *
 * Based on mv88e6xxx.h
 * Copyright (c) 2008 Marvell Semiconductor
 *
 * This program is free software; you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation; either version 2 of the License, or
 * (at your option) any later version.
 */

Thanks,
Neil
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] netfilter: fix xt_TEE and xt_TPROXY dependencies

2015-11-10 Thread Arnd Bergmann
Kconfig is too smart for its own good: a Kconfig line that states

select NF_DEFRAG_IPV6 if IP6_NF_IPTABLES

means that if IP6_NF_IPTABLES is set to 'm', then NF_DEFRAG_IPV6 will
also be set to 'm', regardless of the state of the symbol from which
it is selected. When the xt_TEE driver is built-in and nothing else
forces NF_DEFRAG_IPV6 to be built-in, this causes a link-time error:

net/built-in.o: In function `tee_tg6':
net/netfilter/xt_TEE.c:46: undefined reference to `nf_dup_ipv6'

This works around that behavior by changing the dependency to
'if IP6_NF_IPTABLES != n', which is interpreted as boolean expression
rather than a tristate and causes the NF_DEFRAG_IPV6 symbol to
be built-in as well.

The bug only occurs once in thousands of 'randconfig' builds and
does not really impact real users. From inspecting the other
surrounding Kconfig symbols, I am guessing that NETFILTER_XT_TARGET_TPROXY
and NETFILTER_XT_MATCH_SOCKET have the same issue. If not, this
change should still be harmless.

Signed-off-by: Arnd Bergmann 
---
I have done a few thousand randconfig builds with this applied, and the
problem did not come back, but it is super-rare.

Several people have tried to fix this in the past, but so far
every patch was wrong. Maybe this one is lucky.

diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
index e22349ea7256..4692782b5280 100644
--- a/net/netfilter/Kconfig
+++ b/net/netfilter/Kconfig
@@ -869,7 +869,7 @@ config NETFILTER_XT_TARGET_TEE
depends on IPV6 || IPV6=n
depends on !NF_CONNTRACK || NF_CONNTRACK
select NF_DUP_IPV4
-   select NF_DUP_IPV6 if IP6_NF_IPTABLES
+   select NF_DUP_IPV6 if IP6_NF_IPTABLES != n
---help---
This option adds a "TEE" target with which a packet can be cloned and
this clone be rerouted to another nexthop.
@@ -882,7 +882,7 @@ config NETFILTER_XT_TARGET_TPROXY
depends on IP6_NF_IPTABLES || IP6_NF_IPTABLES=n
depends on IP_NF_MANGLE
select NF_DEFRAG_IPV4
-   select NF_DEFRAG_IPV6 if IP6_NF_IPTABLES
+   select NF_DEFRAG_IPV6 if IP6_NF_IPTABLES != n
help
  This option adds a `TPROXY' target, which is somewhat similar to
  REDIRECT.  It can only be used in the mangle table and is useful
@@ -1375,7 +1375,7 @@ config NETFILTER_XT_MATCH_SOCKET
depends on IPV6 || IPV6=n
depends on IP6_NF_IPTABLES || IP6_NF_IPTABLES=n
select NF_DEFRAG_IPV4
-   select NF_DEFRAG_IPV6 if IP6_NF_IPTABLES
+   select NF_DEFRAG_IPV6 if IP6_NF_IPTABLES != n
help
  This option adds a `socket' match, which can be used to match
  packets for which a TCP or UDP socket lookup finds a valid socket.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] net: netfilter: fix GCC uninitialized warning

2015-11-10 Thread Pablo Neira Ayuso
On Fri, Nov 06, 2015 at 10:13:16PM +0300, Dmitry Safonov wrote:
> I thought, it was decided to use 0/NULL/whatever, than uninitialized_var()?
> Is right now?
> http://thread.gmane.org/gmane.linux.kernel/1383415

I overlook that one. We should stick to mainstream policies as much as
possible.

Arnd just sent a patch to address one of this by initializing the
variable: http://patchwork.ozlabs.org/patch/542259/

Please follow up with a patch to initialize the variable to avoid the
warning in other spots.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/6] net: dsa: mv88e6060: cleanup and fix setup

2015-11-10 Thread Andrew Lunn
On Tue, Nov 10, 2015 at 02:25:05PM +0100, Neil Armstrong wrote:
> This patchset introduces somes fixes and a registers addressing cleanup for
> the mv88e6060 DSA driver.

Hi Neil

I gave my Acked-by to the RFC version. It is normal to include that in
following versions, so long as there has not been major changes.  When
you fix the double (), please add my Acked-by: to all the patches.

Thanks
Andrew
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/6] net: dsa: mv88e6060: add register defines header file

2015-11-10 Thread Vivien Didelot
Hi Neil,

On Nov. Tuesday 10 (46) 02:25 PM, Neil Armstrong wrote:
> To align with the mv88e6xxx code, add a similar header file
> with all the register defines.
> The file is based on the mv88e6xxx header for coherency.
> 
> Signed-off-by: Neil Armstrong 

In the RFC patchset, Andrew mentioned that there is not that much things in
common with mv88e6xxx, so I don't really see a value to add a separate header
file. Would that make sense to you guys to add the defines directly in
mv88e6060.c and squash that in the last patch?

> ---
>  drivers/net/dsa/mv88e6060.h | 108 
> 
>  1 file changed, 108 insertions(+)
>  create mode 100644 drivers/net/dsa/mv88e6060.h
> 
> diff --git a/drivers/net/dsa/mv88e6060.h b/drivers/net/dsa/mv88e6060.h
> new file mode 100644
> index 000..adbc894
> --- /dev/null
> +++ b/drivers/net/dsa/mv88e6060.h
> @@ -0,0 +1,108 @@
> +/*
> + * net/dsa/mv88e6060.h - Marvell 88e6060 switch chip support
> + * Copyright (c) 2008 Marvell Semiconductor

Also I don't think the copyright notice is correct here.

Thanks,
-v
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net v2 0/3] packet fixes

2015-11-10 Thread Daniel Borkmann
Fixes a couple of issues in packet sockets, i.e. on TX ring side. See
individual patches for details.

v1 -> v2:
 - Added patch 2 as suggested by Dave
 - Rest is unchanged from previous submission

Daniel Borkmann (3):
  packet: do skb_probe_transport_header when we actually have data
  packet: always probe for transport header
  packet: fix tpacket_snd max frame and vlan handling

 net/packet/af_packet.c | 109 ++---
 1 file changed, 67 insertions(+), 42 deletions(-)

-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net v2 2/3] packet: always probe for transport header

2015-11-10 Thread Daniel Borkmann
We concluded that the skb_probe_transport_header() should better be
called unconditionally. Avoiding the call into the flow dissector has
also not really much to do with the direct xmit mode.

While it seems that only virtio_net code makes use of GSO from non
RX/TX ring packet socket paths, we should probe for a transport header
nevertheless before they hit devices.

Reference: http://thread.gmane.org/gmane.linux.network/386173/
Signed-off-by: Daniel Borkmann 
Cc: Jason Wang 
---
 net/packet/af_packet.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 80c36c0..bdecf17 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -2447,8 +2447,7 @@ static int tpacket_fill_skb(struct packet_sock *po, 
struct sk_buff *skb,
len = ((to_write > len_max) ? len_max : to_write);
}
 
-   if (!packet_use_direct_xmit(po))
-   skb_probe_transport_header(skb, 0);
+   skb_probe_transport_header(skb, 0);
 
return tp_len;
 }
@@ -2808,8 +2807,8 @@ static int packet_snd(struct socket *sock, struct msghdr 
*msg, size_t len)
len += vnet_hdr_len;
}
 
-   if (!packet_use_direct_xmit(po))
-   skb_probe_transport_header(skb, reserve);
+   skb_probe_transport_header(skb, reserve);
+
if (unlikely(extra_len == 4))
skb->no_fcs = 1;
 
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net v2 3/3] packet: fix tpacket_snd max frame and vlan handling

2015-11-10 Thread Daniel Borkmann
There seem to be a couple of issues in tpacket_snd() path. Since it's
introduction in commit 69e3c75f4d54 ("net: TX_RING and packet mmap"),
TX_RING could be used from SOCK_DGRAM and SOCK_RAW side. When used
with SOCK_DGRAM only, the size_max > dev->mtu + reserve check should
have reserve as 0, but currently, this is unconditionally set (in it's
original form as dev->hard_header_len).

I think this is not correct since tpacket_fill_skb() would then take
dev->mtu and dev->hard_header_len into account for SOCK_DGRAM, the
extra VLAN_HLEN could be possible in both cases. Presumably, the
reserve code was copied from packet_snd(), but later on missed the
check.

Moreover, the check for ETH_P_8021Q seems buggy as we potentially
operate on non-linear header data. In sock_alloc_send_skb() we seem
to allocate enough linear head-room in most cases, so the test could
just yield false results. The sock_alloc_send_skb() has an extra room
of (for some reason) sizeof(struct sockaddr_ll), so we have enough
header room for at least ethernet header in case the device doesn't
ask for it. Note that in TX_RING's tpacket_fill_skb() we fetch the
frame data after the slot header (or at some provided offset), thus
from TX side (as opposed to RX_RING), it doesn't contain a sockaddr_ll
structure in between.

Anyway, the ETH_P_8021Q check would be better fixed in tpacket_fill_skb()
by making sure that in SOCK_RAW cases, where we deal with tp_len >
dev->mtu + dev->hard_header_len, to place at least the ethernet header
into the linear section (which is the case in SOCK_DGRAM already).
Doing this test on ring buffer data (either from set up skb or on data
directly) could race underneath us. The test is done at the end so we
can take both socket types into consideration. We check that
skb->protocol and also h_proto are correct in these cases. For packet
sockets that have proto of 0, guess the skb->protocol from the user
payload as suggested.

Fixes: 69e3c75f4d54 ("net: TX_RING and packet mmap")
Fixes: 52f1454f629f ("packet: allow to transmit +4 byte in TX_RING slot for 
VLAN case")
Signed-off-by: Daniel Borkmann 
Cc: Eric Dumazet 
---
 net/packet/af_packet.c | 101 ++---
 1 file changed, 63 insertions(+), 38 deletions(-)

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index bdecf17..5d40ad3 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -2320,12 +2320,12 @@ static void tpacket_destruct_skb(struct sk_buff *skb)
sock_wfree(skb);
 }
 
-static bool ll_header_truncated(const struct net_device *dev, int len)
+static bool ll_header_truncated(int hard_header_len, int len)
 {
/* net device doesn't like empty head */
-   if (unlikely(len <= dev->hard_header_len)) {
+   if (unlikely(len <= hard_header_len)) {
net_warn_ratelimited("%s: packet size is too short (%d <= 
%d)\n",
-current->comm, len, dev->hard_header_len);
+current->comm, len, hard_header_len);
return true;
}
 
@@ -2333,8 +2333,9 @@ static bool ll_header_truncated(const struct net_device 
*dev, int len)
 }
 
 static int tpacket_fill_skb(struct packet_sock *po, struct sk_buff *skb,
-   void *frame, struct net_device *dev, int size_max,
-   __be16 proto, unsigned char *addr, int hlen)
+   void *frame, struct net_device *dev, int size_max,
+   __be16 proto, unsigned char *addr, int hlen,
+   int reserve)
 {
union tpacket_uhdr ph;
int to_write, offset, len, tp_len, nr_frags, len_max;
@@ -2400,22 +2401,45 @@ static int tpacket_fill_skb(struct packet_sock *po, 
struct sk_buff *skb,
to_write = tp_len;
 
if (sock->type == SOCK_DGRAM) {
-   err = dev_hard_header(skb, dev, ntohs(proto), addr,
-   NULL, tp_len);
+   /* In DGRAM sockets, we expect struct sockaddr_ll was filled
+* via struct msghdr, so we have dest mac and skb->protocol.
+* Otherwise there's not too much useful things we can do in
+* this flush run.
+*/
+   err = dev_hard_header(skb, dev, ntohs(skb->protocol), addr,
+ NULL, tp_len);
if (unlikely(err < 0))
return -EINVAL;
-   } else if (dev->hard_header_len) {
-   if (ll_header_truncated(dev, tp_len))
-   return -EINVAL;
+   } else {
+   /* If skb->protocol is still 0, try to infer/guess it. Might
+* not be fully reliable in the sense that a user could still
+* change/race data afterwards, but on the other hand the proto
+* can be set arbitrarily anyways. We only need to take care
+* in case 

[PATCH 1/6] net: dsa: mv88e6060: remove poll_link callback

2015-11-10 Thread Neil Armstrong
As of mv88e6xxx remove the poll_link callback since the link
state change polling is now handled by the phylib.

Tested on a mv88e6060 B0 device with a TI DM816X SoC.

Suggested-by: Andrew Lunn 
Signed-off-by: Neil Armstrong 
---
 drivers/net/dsa/mv88e6060.c | 49 -
 1 file changed, 49 deletions(-)

diff --git a/drivers/net/dsa/mv88e6060.c b/drivers/net/dsa/mv88e6060.c
index 9093577..6885ef5 100644
--- a/drivers/net/dsa/mv88e6060.c
+++ b/drivers/net/dsa/mv88e6060.c
@@ -225,54 +225,6 @@ mv88e6060_phy_write(struct dsa_switch *ds, int port, int 
regnum, u16 val)
return reg_write(ds, addr, regnum, val);
 }

-static void mv88e6060_poll_link(struct dsa_switch *ds)
-{
-   int i;
-
-   for (i = 0; i < DSA_MAX_PORTS; i++) {
-   struct net_device *dev;
-   int uninitialized_var(port_status);
-   int link;
-   int speed;
-   int duplex;
-   int fc;
-
-   dev = ds->ports[i];
-   if (dev == NULL)
-   continue;
-
-   link = 0;
-   if (dev->flags & IFF_UP) {
-   port_status = reg_read(ds, REG_PORT(i), 0x00);
-   if (port_status < 0)
-   continue;
-
-   link = !!(port_status & 0x1000);
-   }
-
-   if (!link) {
-   if (netif_carrier_ok(dev)) {
-   netdev_info(dev, "link down\n");
-   netif_carrier_off(dev);
-   }
-   continue;
-   }
-
-   speed = (port_status & 0x0100) ? 100 : 10;
-   duplex = (port_status & 0x0200) ? 1 : 0;
-   fc = ((port_status & 0xc000) == 0xc000) ? 1 : 0;
-
-   if (!netif_carrier_ok(dev)) {
-   netdev_info(dev,
-   "link up, %d Mb/s, %s duplex, flow control 
%sabled\n",
-   speed,
-   duplex ? "full" : "half",
-   fc ? "en" : "dis");
-   netif_carrier_on(dev);
-   }
-   }
-}
-
 static struct dsa_switch_driver mv88e6060_switch_driver = {
.tag_protocol   = DSA_TAG_PROTO_TRAILER,
.probe  = mv88e6060_probe,
@@ -280,7 +232,6 @@ static struct dsa_switch_driver mv88e6060_switch_driver = {
.set_addr   = mv88e6060_set_addr,
.phy_read   = mv88e6060_phy_read,
.phy_write  = mv88e6060_phy_write,
-   .poll_link  = mv88e6060_poll_link,
 };

 static int __init mv88e6060_init(void)
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/6] net: dsa: mv88e6060: cleanup and fix setup

2015-11-10 Thread Neil Armstrong
This patchset introduces somes fixes and a registers addressing cleanup for
the mv88e6060 DSA driver.

The first patch removes the poll_link as mv88e6xxx.
The 3 following patches fixes the setup in regards of the datasheet.
The 2 last patches introduces a clean header and replaces all magic values.

Neil Armstrong (6):
  net: dsa: mv88e6060: remove poll_link callback
  net: dsa: mv88e6060: use the correct InitReady bit
  net: dsa: mv88e6060: use the correct MaxFrameSize bit
  net: dsa: mv88e6060: use the correct bit shift for mac0
  net: dsa: mv88e6060: add register defines header file
  net: dsa: mv88e6060: replace magic values with register defines

 drivers/net/dsa/mv88e6060.c | 114 +++-
 drivers/net/dsa/mv88e6060.h | 108 +
 2 files changed, 146 insertions(+), 76 deletions(-)
 create mode 100644 drivers/net/dsa/mv88e6060.h

-- 
1.9.1
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/1] can: sja1000: clear interrupts on start

2015-11-10 Thread Mirza Krak
From: Mirza Krak 

According to SJA1000 data sheet error-warning (EI) interrupt is not
cleared by setting the controller in to reset-mode.

Then if we have the following case:
- system is suspended (echo mem > /sys/power/state) and SJA1000 is left
in operating state
- A bus error condition occurs which activates EI interrupt, system is
still suspended which means EI interrupt will be not be handled nor
cleared.

If the above two events occur, on resume there is no way to return the
SJA1000 to operating state, except to cycle power to it.

By simply reading the IR register on start we will clear any previous
conditions that could be present.

Signed-off-by: Mirza Krak 
---
 drivers/net/can/sja1000/sja1000.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/can/sja1000/sja1000.c 
b/drivers/net/can/sja1000/sja1000.c
index 7b92e911a616..f10834be48a5 100644
--- a/drivers/net/can/sja1000/sja1000.c
+++ b/drivers/net/can/sja1000/sja1000.c
@@ -218,6 +218,9 @@ static void sja1000_start(struct net_device *dev)
priv->write_reg(priv, SJA1000_RXERR, 0x0);
priv->read_reg(priv, SJA1000_ECC);
 
+   /* clear interrupt flags */
+   priv->read_reg(priv, SJA1000_IR);
+
/* leave reset mode */
set_normal_mode(dev);
 }
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 6/6] net: dsa: mv88e6060: replace magic values with register defines

2015-11-10 Thread Neil Armstrong
To align with the mv88e6xxx code, use the register defines to
access all the register addresses and bit fields.

Signed-off-by: Neil Armstrong 
---
 drivers/net/dsa/mv88e6060.c | 64 ++---
 1 file changed, 37 insertions(+), 27 deletions(-)

diff --git a/drivers/net/dsa/mv88e6060.c b/drivers/net/dsa/mv88e6060.c
index 449499d..6b5353e 100644
--- a/drivers/net/dsa/mv88e6060.c
+++ b/drivers/net/dsa/mv88e6060.c
@@ -15,9 +15,7 @@
 #include 
 #include 
 #include 
-
-#define REG_PORT(p)(8 + (p))
-#define REG_GLOBAL 0x0f
+#include "mv88e6060.h"

 static int reg_read(struct dsa_switch *ds, int addr, int reg)
 {
@@ -67,13 +65,14 @@ static char *mv88e6060_probe(struct device *host_dev, int 
sw_addr)
if (bus == NULL)
return NULL;

-   ret = mdiobus_read(bus, sw_addr + REG_PORT(0), 0x03);
+   ret = mdiobus_read(bus, sw_addr + REG_PORT(0), PORT_SWITCH_ID);
if (ret >= 0) {
-   if (ret == 0x0600)
+   if (ret == PORT_SWITCH_ID_6060)
return "Marvell 88E6060 (A0)";
-   if (ret == 0x0601 || ret == 0x0602)
+   if (ret == PORT_SWITCH_ID_6060_R1 ||
+   ret == PORT_SWITCH_ID_6060_R2)
return "Marvell 88E6060 (B0)";
-   if ((ret & 0xfff0) == 0x0600)
+   if ((ret & PORT_SWITCH_ID_6060_MASK) == PORT_SWITCH_ID_6060)
return "Marvell 88E6060";
}

@@ -87,22 +86,26 @@ static int mv88e6060_switch_reset(struct dsa_switch *ds)
unsigned long timeout;

/* Set all ports to the disabled state. */
-   for (i = 0; i < 6; i++) {
-   ret = REG_READ(REG_PORT(i), 0x04);
-   REG_WRITE(REG_PORT(i), 0x04, ret & 0xfffc);
+   for (i = 0; i < MV88E6060_PORTS; i++) {
+   ret = REG_READ(REG_PORT(i), PORT_CONTROL);
+   REG_WRITE(REG_PORT(i), PORT_CONTROL,
+ ret & ~PORT_CONTROL_STATE_MASK);
}

/* Wait for transmit queues to drain. */
usleep_range(2000, 4000);

/* Reset the switch. */
-   REG_WRITE(REG_GLOBAL, 0x0a, 0xa130);
+   REG_WRITE(REG_GLOBAL, GLOBAL_ATU_CONTROL,
+ GLOBAL_ATU_CONTROL_SWRESET |
+ GLOBAL_ATU_CONTROL_ATUSIZE_1024 |
+ GLOBAL_ATU_CONTROL_ATE_AGE_5MIN);

/* Wait up to one second for reset to complete. */
timeout = jiffies + 1 * HZ;
while (time_before(jiffies, timeout)) {
-   ret = REG_READ(REG_GLOBAL, 0x00);
-   if ((ret & 0x800))
+   ret = REG_READ(REG_GLOBAL, GLOBAL_STATUS);
+   if ((ret & GLOBAL_STATUS_INIT_READY))
break;

usleep_range(1000, 2000);
@@ -119,13 +122,15 @@ static int mv88e6060_setup_global(struct dsa_switch *ds)
 * set the maximum frame size to 1536 bytes, and mask all
 * interrupt sources.
 */
-   REG_WRITE(REG_GLOBAL, 0x04, 0x400);
+   REG_WRITE(REG_GLOBAL, GLOBAL_CONTROL, GLOBAL_CONTROL_MAX_FRAME_1536);

/* Enable automatic address learning, set the address
 * database size to 1024 entries, and set the default aging
 * time to 5 minutes.
 */
-   REG_WRITE(REG_GLOBAL, 0x0a, 0x2130);
+   REG_WRITE(REG_GLOBAL, GLOBAL_ATU_CONTROL,
+ GLOBAL_ATU_CONTROL_ATUSIZE_1024 |
+ GLOBAL_ATU_CONTROL_ATE_AGE_5MIN);

return 0;
 }
@@ -139,25 +144,30 @@ static int mv88e6060_setup_port(struct dsa_switch *ds, 
int p)
 * state to Forwarding.  Additionally, if this is the CPU
 * port, enable Ingress and Egress Trailer tagging mode.
 */
-   REG_WRITE(addr, 0x04, dsa_is_cpu_port(ds, p) ?  0x4103 : 0x0003);
+   REG_WRITE(addr, PORT_CONTROL,
+ dsa_is_cpu_port(ds, p) ?
+   PORT_CONTROL_TRAILER |
+   PORT_CONTROL_INGRESS_MODE |
+   PORT_CONTROL_STATE_FORWARDING :
+   PORT_CONTROL_STATE_FORWARDING);

/* Port based VLAN map: give each port its own address
 * database, allow the CPU port to talk to each of the 'real'
 * ports, and allow each of the 'real' ports to only talk to
 * the CPU port.
 */
-   REG_WRITE(addr, 0x06,
-   ((p & 0xf) << 12) |
-(dsa_is_cpu_port(ds, p) ?
-   ds->phys_port_mask :
-   (1 << ds->dst->cpu_port)));
+   REG_WRITE(addr, PORT_VLAN_MAP,
+ ((p & 0xf) << PORT_VLAN_MAP_DBNUM_SHIFT) |
+  (dsa_is_cpu_port(ds, p) ?
+   ds->phys_port_mask :
+   BIT(ds->dst->cpu_port)));

/* Port Association Vector: when learning source addresses
 * of packets, add the address to the address database using
   

[PATCH 3/6] net: dsa: mv88e6060: use the correct MaxFrameSize bit

2015-11-10 Thread Neil Armstrong
According to the mv88e6060 datasheet, the MaxFrameSize bit position
is 10 instead of 11 which is reserved.
Use the bit correctly to setup max frame size to 1536.

Signed-off-by: Neil Armstrong 
---
 drivers/net/dsa/mv88e6060.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/dsa/mv88e6060.c b/drivers/net/dsa/mv88e6060.c
index 26f668c..6a35e3f 100644
--- a/drivers/net/dsa/mv88e6060.c
+++ b/drivers/net/dsa/mv88e6060.c
@@ -119,7 +119,7 @@ static int mv88e6060_setup_global(struct dsa_switch *ds)
 * set the maximum frame size to 1536 bytes, and mask all
 * interrupt sources.
 */
-   REG_WRITE(REG_GLOBAL, 0x04, 0x0800);
+   REG_WRITE(REG_GLOBAL, 0x04, 0x400);

/* Enable automatic address learning, set the address
 * database size to 1024 entries, and set the default aging
-- 
1.9.1
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/6] net: dsa: mv88e6060: use the correct bit shift for mac0

2015-11-10 Thread Neil Armstrong
According to the mv88e6060 datasheet, the first mac byte must
be at position 9 instead of 8 since the bit 8 is used to select
if the mac address must differ for each port for Pause frames.
Use the correct shift and set the same mac address for all port.

Signed-off-by: Neil Armstrong 
---
 drivers/net/dsa/mv88e6060.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/dsa/mv88e6060.c b/drivers/net/dsa/mv88e6060.c
index 6a35e3f..449499d 100644
--- a/drivers/net/dsa/mv88e6060.c
+++ b/drivers/net/dsa/mv88e6060.c
@@ -188,7 +188,8 @@ static int mv88e6060_setup(struct dsa_switch *ds)

 static int mv88e6060_set_addr(struct dsa_switch *ds, u8 *addr)
 {
-   REG_WRITE(REG_GLOBAL, 0x01, (addr[0] << 8) | addr[1]);
+   /* Use the same MAC Address as FD Pause frames for all ports */
+   REG_WRITE(REG_GLOBAL, 0x01, (addr[0] << 9) | addr[1]);
REG_WRITE(REG_GLOBAL, 0x02, (addr[2] << 8) | addr[3]);
REG_WRITE(REG_GLOBAL, 0x03, (addr[4] << 8) | addr[5]);

-- 
1.9.1
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/6] net: dsa: mv88e6060: use the correct InitReady bit

2015-11-10 Thread Neil Armstrong
According to the mv88e6060 datasheet, the InitReady bit position
is 11 and the polarity is inverted.
Use the bit correctly to detect the end of initialization.

Signed-off-by: Neil Armstrong 
---
 drivers/net/dsa/mv88e6060.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/dsa/mv88e6060.c b/drivers/net/dsa/mv88e6060.c
index 6885ef5..26f668c 100644
--- a/drivers/net/dsa/mv88e6060.c
+++ b/drivers/net/dsa/mv88e6060.c
@@ -102,7 +102,7 @@ static int mv88e6060_switch_reset(struct dsa_switch *ds)
timeout = jiffies + 1 * HZ;
while (time_before(jiffies, timeout)) {
ret = REG_READ(REG_GLOBAL, 0x00);
-   if ((ret & 0x8000) == 0x)
+   if ((ret & 0x800))
break;

usleep_range(1000, 2000);
-- 
1.9.1
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/6] net: dsa: mv88e6060: add register defines header file

2015-11-10 Thread Neil Armstrong
To align with the mv88e6xxx code, add a similar header file
with all the register defines.
The file is based on the mv88e6xxx header for coherency.

Signed-off-by: Neil Armstrong 
---
 drivers/net/dsa/mv88e6060.h | 108 
 1 file changed, 108 insertions(+)
 create mode 100644 drivers/net/dsa/mv88e6060.h

diff --git a/drivers/net/dsa/mv88e6060.h b/drivers/net/dsa/mv88e6060.h
new file mode 100644
index 000..adbc894
--- /dev/null
+++ b/drivers/net/dsa/mv88e6060.h
@@ -0,0 +1,108 @@
+/*
+ * net/dsa/mv88e6060.h - Marvell 88e6060 switch chip support
+ * Copyright (c) 2008 Marvell Semiconductor
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#ifndef __MV88E6060_H
+#define __MV88E6060_H
+
+#define MV88E6060_PORTS6
+
+#define REG_PORT(p)(0x8 + (p))
+#define PORT_STATUS0x00
+#define PORT_STATUS_PAUSE_EN   BIT(15)
+#define PORT_STATUS_MY_PAUSE   BIT(14)
+#define PORT_STATUS_FC (PORT_STATUS_MY_PAUSE | PORT_STATUS_PAUSE_EN)
+#define PORT_STATUS_RESOLVED   BIT(13)
+#define PORT_STATUS_LINK   BIT(12)
+#define PORT_STATUS_PORTMODE   BIT(11)
+#define PORT_STATUS_PHYMODEBIT(10)
+#define PORT_STATUS_DUPLEX BIT(9)
+#define PORT_STATUS_SPEED  BIT(8)
+#define PORT_SWITCH_ID 0x03
+#define PORT_SWITCH_ID_60600x0600
+#define PORT_SWITCH_ID_6060_MASK   0xfff0
+#define PORT_SWITCH_ID_6060_R1 0x0601
+#define PORT_SWITCH_ID_6060_R2 0x0602
+#define PORT_CONTROL   0x04
+#define PORT_CONTROL_FORCE_FLOW_CTRL   BIT(15)
+#define PORT_CONTROL_TRAILER   BIT(14)
+#define PORT_CONTROL_HEADERBIT(11)
+#define PORT_CONTROL_INGRESS_MODE  BIT(8)
+#define PORT_CONTROL_VLAN_TUNNEL   BIT(7)
+#define PORT_CONTROL_STATE_MASK0x03
+#define PORT_CONTROL_STATE_DISABLED0x00
+#define PORT_CONTROL_STATE_BLOCKING0x01
+#define PORT_CONTROL_STATE_LEARNING0x02
+#define PORT_CONTROL_STATE_FORWARDING  0x03
+#define PORT_VLAN_MAP  0x06
+#define PORT_VLAN_MAP_DBNUM_SHIFT  12
+#define PORT_VLAN_MAP_TABLE_MASK   0x1f
+#define PORT_ASSOC_VECTOR  0x0b
+#define PORT_ASSOC_VECTOR_MONITOR  BIT(15)
+#define PORT_ASSOC_VECTOR_PAV_MASK 0x1f
+#define PORT_RX_CNTR   0x10
+#define PORT_TX_CNTR   0x11
+
+#define REG_GLOBAL 0x0f
+#define GLOBAL_STATUS  0x00
+#define GLOBAL_STATUS_SW_MODE_MASK (0x3 << 12)
+#define GLOBAL_STATUS_SW_MODE_0(0x0 << 12)
+#define GLOBAL_STATUS_SW_MODE_1(0x1 << 12)
+#define GLOBAL_STATUS_SW_MODE_2(0x2 << 12)
+#define GLOBAL_STATUS_SW_MODE_3(0x3 << 12)
+#define GLOBAL_STATUS_INIT_READY   BIT(11)
+#define GLOBAL_STATUS_ATU_FULL BIT(3)
+#define GLOBAL_STATUS_ATU_DONE BIT(2)
+#define GLOBAL_STATUS_PHY_INT  BIT(1)
+#define GLOBAL_STATUS_EEINTBIT(0)
+#define GLOBAL_MAC_01  0x01
+#define GLOBAL_MAC_01_DIFF_ADDRBIT(8)
+#define GLOBAL_MAC_23  0x02
+#define GLOBAL_MAC_45  0x03
+#define GLOBAL_CONTROL 0x04
+#define GLOBAL_CONTROL_DISCARD_EXCESS  BIT(13)
+#define GLOBAL_CONTROL_MAX_FRAME_1536  BIT(10)
+#define GLOBAL_CONTROL_RELOAD_EEPROM   BIT(9)
+#define GLOBAL_CONTROL_CTRMODE BIT(8)
+#define GLOBAL_CONTROL_ATU_FULL_EN BIT(3)
+#define GLOBAL_CONTROL_ATU_DONE_EN BIT(2)
+#define GLOBAL_CONTROL_PHYINT_EN   BIT(1)
+#define GLOBAL_CONTROL_EEPROM_DONE_EN  BIT(0)
+#define GLOBAL_ATU_CONTROL 0x0a
+#define GLOBAL_ATU_CONTROL_SWRESET BIT(15)
+#define GLOBAL_ATU_CONTROL_LEARNDISBIT(14)
+#define GLOBAL_ATU_CONTROL_ATUSIZE_256 (0x0 << 12)
+#define GLOBAL_ATU_CONTROL_ATUSIZE_512 (0x1 << 12)
+#define GLOBAL_ATU_CONTROL_ATUSIZE_1024(0x2 << 12)
+#define GLOBAL_ATU_CONTROL_ATE_AGE_SHIFT   4
+#define GLOBAL_ATU_CONTROL_ATE_AGE_MASK(0xff << 4)
+#define GLOBAL_ATU_CONTROL_ATE_AGE_5MIN(0x13 << 4)
+#define GLOBAL_ATU_OP  0x0b
+#define GLOBAL_ATU_OP_BUSY BIT(15)
+#define GLOBAL_ATU_OP_NOP  (0 << 12)
+#define GLOBAL_ATU_OP_FLUSH_ALL((1 << 12) | GLOBAL_ATU_OP_BUSY)
+#define GLOBAL_ATU_OP_FLUSH_UNLOCKED   ((2 << 12) | GLOBAL_ATU_OP_BUSY)
+#define GLOBAL_ATU_OP_LOAD_DB  ((3 << 12) | GLOBAL_ATU_OP_BUSY)
+#define GLOBAL_ATU_OP_GET_NEXT_DB  ((4 << 12) | GLOBAL_ATU_OP_BUSY)
+#define GLOBAL_ATU_OP_FLUSH_DB ((5 << 12) | GLOBAL_ATU_OP_BUSY)
+#define GLOBAL_ATU_OP_FLUSH_UNLOCKED_DB ((6 << 12) | GLOBAL_ATU_OP_BUSY)
+#define GLOBAL_ATU_DATA0x0c
+#define GLOBAL_ATU_DATA_PORT_VECTOR_MASK   0x3f0
+#define GLOBAL_ATU_DATA_PORT_VECTOR_SHIFT  4
+#define GLOBAL_ATU_DATA_STATE_MASK 0x0f
+#define GLOBAL_ATU_DATA_STATE_UNUSED   0x00
+#define GLOBAL_ATU_DATA_STATE_UC_STATIC0x0e
+#define 

Re: [PATCH, REPORT] bpf_trace: build error without PERF_EVENTS

2015-11-10 Thread Daniel Borkmann

On 11/10/2015 01:55 PM, Arnd Bergmann wrote:

In my ARM randconfig tests, I'm getting a build error for
newly added code in bpf_perf_event_read and bpf_perf_event_output
whenever CONFIG_PERF_EVENTS is disabled:

kernel/trace/bpf_trace.c: In function 'bpf_perf_event_read':
kernel/trace/bpf_trace.c:203:11: error: 'struct perf_event' has no member named 
'oncpu'
if (event->oncpu != smp_processor_id() ||
  ^
kernel/trace/bpf_trace.c:204:11: error: 'struct perf_event' has no member named 
'pmu'
   event->pmu->count)

This can happen when UPROBE_EVENT is enabled but KPROBE_EVENT
is disabled. I'm not sure if that is a configuration we care
about, otherwise we could prevent this case from occuring by
adding Kconfig dependencies.


I think that seems better than spreading #if IS_ENABLEDs into the code.
Probably enough to add a 'depends on PERF_EVENTS' to config BPF_EVENTS,
so it's also explicitly documented.


Simply hiding the broken code inside #ifdef CONFIG_PERF_EVENTS
as this patch does seems to reliably fix the error as well,
I have built thousands of randconfig kernels since I started
seeing this and added the workaround.

Signed-off-by: Arnd Bergmann 
Fixes: 62544ce8e01c ("bpf: fix bpf_perf_event_read() helper")
Fixes: a43eec304259 ("bpf: introduce bpf_perf_event_output() helper")


Thanks,
Daniel
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] arm64: bpf: fix JIT stack setup

2015-11-10 Thread Z Lim
On Tue, Nov 10, 2015 at 11:46 AM, Shi, Yang  wrote:
> On 11/9/2015 12:00 PM, Z Lim wrote:
>>
>> How about splitting this into two patches? One for the BPF-related
>> bug, and another for A64 FP-handling.
>
> I'm not sure if this is a good approach or not. IMHO, they are kind of
> atomic. Without A64 FP-handling, that fix looks incomplete and introduces
> another problem (stack backtrace).
>

The first, even on its own, doesn't make things worse, only better.
The second, which we agree needs to be fixed also, addresses a different issue.

Either way, please also note that these patches fix the original
implementation. We do want -stable to pick these up.

Suggestions for the diagram:
- As an enhancement, would you mind showing the A64_FP also?
- Please revisit "+64:"
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


kasan r8169 use-after-free trace.

2015-11-10 Thread Dave Jones
This happens during boot, (and then there's a flood of traces that happen so 
fast
afterwards it completely overwhelms serial console; not sure if they're the
same/related or not).


==
BUG: KASAN: use-after-free in rtl8169_poll+0x4b6/0xb70 at addr 8801d43b3288
Read of size 1 by task kworker/0:3/188
=
BUG kmalloc-256 (Not tainted): kasan: bad access detected
-

Disabling lock debugging due to kernel taint
INFO: Slab 0xea000750ecc0 objects=16 used=16 fp=0x  (null) 
flags=0x8080
INFO: Object 0x8801d43b3200 @offset=512 fp=0x8801d43b3800

Bytes b4 8801d43b31f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  

Object 8801d43b3200: 00 38 3b d4 01 88 ff ff 00 00 00 00 00 00 00 00  
.8;.
Object 8801d43b3210: 0d 17 8e 3c 8b 87 15 14 00 00 00 00 00 00 00 00  
...<
Object 8801d43b3220: 00 80 bb 37 00 88 ff ff 00 00 00 00 00 00 00 00  
...7
Object 8801d43b3230: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  

Object 8801d43b3240: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  

Object 8801d43b3250: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  

Object 8801d43b3260: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  

Object 8801d43b3270: 00 00 00 00 00 00 00 00 2e 00 00 00 00 00 00 00  

Object 8801d43b3280: 0e 00 00 00 00 00 21 00 01 00 00 00 00 00 00 00  
..!.
Object 8801d43b3290: 00 00 00 00 00 00 00 00 02 00 00 00 00 00 00 00  

Object 8801d43b32a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  

Object 8801d43b32b0: 00 00 00 00 08 06 4e 00 4e 00 40 00 7c 00 00 00  
..N.N.@.|...
Object 8801d43b32c0: 80 00 00 00 00 00 00 00 40 7e 60 d5 01 88 ff ff  
@~`.
Object 8801d43b32d0: 8e 7e 60 d5 01 88 ff ff c0 02 00 00 01 00 00 00  
.~`.
Object 8801d43b32e0: 40 82 c5 d3 01 88 ff ff 00 00 00 00 00 00 00 00  
@...
Object 8801d43b32f0: a8 1c 2d d5 00 88 ff ff 00 00 00 00 00 00 00 00  
..-.
CPU: 0 PID: 188 Comm: kworker/0:3 Tainted: GB   4.3.0-firewall+ #15
Workqueue: events linkwatch_event
 880037bb89d8 8801d7a07bc8 93489155 8801d6801900
 8801d7a07bf8 932295de 8801d6801900 ea000750ecc0
 8801d43b3200 8800d442a000 8801d7a07c20 9322ce06
Call Trace:
   [] dump_stack+0x4e/0x79
 [] print_trailer+0xfe/0x160
 [] object_err+0x36/0x40
 [] kasan_report_error+0x220/0x550
 [] ? dev_gro_receive+0xbb/0x7f0
 [] ? dev_gro_receive+0x2b9/0x7f0
 [] kasan_report+0x3b/0x40
 [] ? rtl8169_poll+0x4b6/0xb70
 [] __asan_load1+0x48/0x50
 [] rtl8169_poll+0x4b6/0xb70
 [] ? _raw_spin_unlock_irqrestore+0x43/0x70
 [] net_rx_action+0x41b/0x6a0
 [] ? napi_complete_done+0x100/0x100
 [] __do_softirq+0x1b2/0x5c0
 [] irq_exit+0xfc/0x110
 [] do_IRQ+0x82/0x160
 [] common_interrupt+0x86/0x86
   [] ? console_unlock+0x3bd/0x620
 [] vprintk_emit+0x3ce/0x6d0
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   >