date:20150805

Re: [PATCH net-next v2 2/7] net: switchdev: support static FDB addresses

2015-08-05 Thread Scott Feldman

On Wed, Aug 5, 2015 at 10:44 PM, Vivien Didelot
 wrote:
> This patch adds a is_static boolean to the switchdev_obj_fdb structure,
> in order to set the ndm_state to either NUD_NOARP or NUD_REACHABLE.
>
> Signed-off-by: Vivien Didelot 
> ---
>  include/net/switchdev.h   | 1 +
>  net/switchdev/switchdev.c | 2 +-
>  2 files changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/include/net/switchdev.h b/include/net/switchdev.h
> index e90e1a0..0e296b8 100644
> --- a/include/net/switchdev.h
> +++ b/include/net/switchdev.h
> @@ -72,6 +72,7 @@ struct switchdev_obj {
> struct switchdev_obj_fdb {  /* PORT_FDB */
> u8 addr[ETH_ALEN];
> u16 vid;
> +   bool is_static;

What do you think about changing this to u16 ndm_state?  That way, it
can be used on input (fdb add) and output (fdb dump), and the driver
can privately track the state, kind of like how the bridge keeps
is_static, is_local, etc.

> } fdb;
> } u;
>  };
> diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
> index 9db87a3..e9d1cac 100644
> --- a/net/switchdev/switchdev.c
> +++ b/net/switchdev/switchdev.c
> @@ -811,7 +811,7 @@ static int switchdev_port_fdb_dump_cb(struct net_device 
> *dev,
> ndm->ndm_flags   = NTF_SELF;
> ndm->ndm_type= 0;
> ndm->ndm_ifindex = dev->ifindex;
> -   ndm->ndm_state   = NUD_REACHABLE;
> +   ndm->ndm_state   = obj->u.fdb.is_static ? NUD_NOARP : NUD_REACHABLE;
>
> if (nla_put(dump->skb, NDA_LLADDR, ETH_ALEN, obj->u.fdb.addr))
> goto nla_put_failure;
> --
> 2.4.6
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next v2 0/7] net: dsa: mv88e6xxx: support switchdev FDB objects

2015-08-05 Thread Vivien Didelot

This patchset refactors the DSA and mv88e6xxx code to use the switchdev FDB
objects.

The first two patches add minor but necessary changes to switchdev, the third
one implements the switchdev glue in DSA for FDB routines, and the remaining
ones refactor the FDB access functions in the mv88e6xxx code.

Below is an usage example (ports 0-2 belongs to br0, ports 3-4 belongs to br1):

# bridge fdb add 3c:97:0e:11:30:6e dev swp2
# bridge fdb add 3c:97:0e:11:40:78 dev swp3
# bridge fdb add 3c:97:0e:11:50:86 dev swp4
# bridge fdb del 3c:97:0e:11:40:78 dev swp3
# bridge fdb
01:00:5e:00:00:01 dev eth0 self permanent
01:00:5e:00:00:01 dev eth1 self permanent
00:50:d2:10:78:15 dev swp0 master br0 permanent
3c:97:0e:11:30:6e dev swp2 self static
00:50:d2:10:78:15 dev swp3 master br1 permanent
3c:97:0e:11:50:86 dev swp4 self static
# cat /sys/kernel/debug/dsa0/atu
# DB   T/P  Vec State Addr
# 001  Port 004   e   3c:97:0e:11:30:6e
# 004  Port 010   e   3c:97:0e:11:50:86

For the 88E6xxx switches, FIDs 1 to num_ports will be reserved for non-bridged
ports and bridge groups, and the remaining will be later used by VLANs.

This change is necessary to welcome the support for hardware VLANs (which will
follow soon).

Changes in v2:

 - remove ndo_bridge_{get,set,del}link from switchdev/DSA glue code

 - use ether_addr_copy instead of memcpy for MAC addresses

 - constify MAC address in port_fdb_{add,del}

 - split the mv88e6xxx code refactoring into several patches

Vivien Didelot (7):
  net: switchdev: change fdb addr for a byte array
  net: switchdev: support static FDB addresses
  net: dsa: add support for switchdev FDB objects
  net: dsa: mv88e6xxx: extend fid mask
  net: dsa: mv88e6xxx: rename ATU MAC accessors
  net: dsa: mv88e6xxx: rework FDB getnext operation
  net: dsa: mv88e6xxx: rework FDB add/del operations

 drivers/net/dsa/mv88e6171.c  |   6 +-
 drivers/net/dsa/mv88e6352.c  |   6 +-
 drivers/net/dsa/mv88e6xxx.c  | 223 ---
 drivers/net/dsa/mv88e6xxx.h  |  31 +++--
 drivers/net/ethernet/rocker/rocker.c |   2 +-
 include/net/dsa.h|  16 ++-
 include/net/switchdev.h  |   3 +-
 net/bridge/br_fdb.c  |   2 +-
 net/dsa/slave.c  | 218 ++
 net/switchdev/switchdev.c|   7 +-
 10 files changed, 317 insertions(+), 197 deletions(-)

-- 
2.4.6

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next v2 1/7] net: switchdev: change fdb addr for a byte array

2015-08-05 Thread Vivien Didelot

The address in the switchdev_obj_fdb structure is currently represented
as a pointer. Replacing it for a 6-byte array allows switchdev to carry
addresses directly read from hardware registers, not stored by the
switch chip driver (as in Rocker).

Signed-off-by: Vivien Didelot 
---
 drivers/net/ethernet/rocker/rocker.c | 2 +-
 include/net/switchdev.h  | 2 +-
 net/bridge/br_fdb.c  | 2 +-
 net/switchdev/switchdev.c| 5 +++--
 4 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/rocker/rocker.c 
b/drivers/net/ethernet/rocker/rocker.c
index 4cd5a71..a5bf809 100644
--- a/drivers/net/ethernet/rocker/rocker.c
+++ b/drivers/net/ethernet/rocker/rocker.c
@@ -4543,7 +4543,7 @@ static int rocker_port_fdb_dump(const struct rocker_port 
*rocker_port,
hash_for_each_safe(rocker->fdb_tbl, bkt, tmp, found, entry) {
if (found->key.pport != rocker_port->pport)
continue;
-   fdb->addr = found->key.addr;
+   ether_addr_copy(fdb->addr, found->key.addr);
fdb->vid = rocker_port_vlan_to_vid(rocker_port,
   found->key.vlan_id);
err = obj->cb(rocker_port->dev, obj);
diff --git a/include/net/switchdev.h b/include/net/switchdev.h
index 89da893..e90e1a0 100644
--- a/include/net/switchdev.h
+++ b/include/net/switchdev.h
@@ -70,7 +70,7 @@ struct switchdev_obj {
u32 tb_id;
} ipv4_fib;
struct switchdev_obj_fdb {  /* PORT_FDB */
-   const unsigned char *addr;
+   u8 addr[ETH_ALEN];
u16 vid;
} fdb;
} u;
diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c
index 9e9875d..5656b44 100644
--- a/net/bridge/br_fdb.c
+++ b/net/bridge/br_fdb.c
@@ -136,11 +136,11 @@ static void fdb_del_external_learn(struct 
net_bridge_fdb_entry *f)
struct switchdev_obj obj = {
.id = SWITCHDEV_OBJ_PORT_FDB,
.u.fdb = {
-   .addr = f->addr.addr,
.vid = f->vlan_id,
},
};
 
+   ether_addr_copy(obj.u.fdb.addr, f->addr.addr);
switchdev_port_obj_del(f->dst->dev, &obj);
 }
 
diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
index 33bafa2..9db87a3 100644
--- a/net/switchdev/switchdev.c
+++ b/net/switchdev/switchdev.c
@@ -15,6 +15,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -742,11 +743,11 @@ int switchdev_port_fdb_add(struct ndmsg *ndm, struct 
nlattr *tb[],
struct switchdev_obj obj = {
.id = SWITCHDEV_OBJ_PORT_FDB,
.u.fdb = {
-   .addr = addr,
.vid = vid,
},
};
 
+   ether_addr_copy(obj.u.fdb.addr, addr);
return switchdev_port_obj_add(dev, &obj);
 }
 EXPORT_SYMBOL_GPL(switchdev_port_fdb_add);
@@ -769,11 +770,11 @@ int switchdev_port_fdb_del(struct ndmsg *ndm, struct 
nlattr *tb[],
struct switchdev_obj obj = {
.id = SWITCHDEV_OBJ_PORT_FDB,
.u.fdb = {
-   .addr = addr,
.vid = vid,
},
};
 
+   ether_addr_copy(obj.u.fdb.addr, addr);
return switchdev_port_obj_del(dev, &obj);
 }
 EXPORT_SYMBOL_GPL(switchdev_port_fdb_del);
-- 
2.4.6

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next v2 4/7] net: dsa: mv88e6xxx: extend fid mask

2015-08-05 Thread Vivien Didelot

The driver currently manages one FID per port (or bridge group), with a
mask of DSA_MAX_PORTS bits, where 0 means that the FID is in use.

The Marvell 88E6xxx switches support up to 4094 FIDs (from 1 to 0xfff;
FID 0 means that multiple address databases are not being used).

This patch changes the fid_mask for an fid_bitmap of 4096 bits.

>From now on, FIDs 1 to num_ports are reserved for non-bridged ports and
bridge groups (a bridge group gets the FID of its first member). The
remaining bits will be reserved for VLAN entries.

Signed-off-by: Vivien Didelot 
---
 drivers/net/dsa/mv88e6xxx.c | 20 +---
 drivers/net/dsa/mv88e6xxx.h |  8 +---
 2 files changed, 18 insertions(+), 10 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx.c b/drivers/net/dsa/mv88e6xxx.c
index 438c73e..b051576 100644
--- a/drivers/net/dsa/mv88e6xxx.c
+++ b/drivers/net/dsa/mv88e6xxx.c
@@ -1091,7 +1091,7 @@ int mv88e6xxx_join_bridge(struct dsa_switch *ds, int 
port, u32 br_port_mask)
ps->bridge_mask[fid] = br_port_mask;
 
if (fid != ps->fid[port]) {
-   ps->fid_mask |= 1 << ps->fid[port];
+   clear_bit(ps->fid[port], ps->fid_bitmap);
ps->fid[port] = fid;
ret = _mv88e6xxx_update_bridge_config(ds, fid);
}
@@ -1125,9 +1125,16 @@ int mv88e6xxx_leave_bridge(struct dsa_switch *ds, int 
port, u32 br_port_mask)
 
mutex_lock(&ps->smi_mutex);
 
-   newfid = __ffs(ps->fid_mask);
+   newfid = find_next_zero_bit(ps->fid_bitmap, VLAN_N_VID, 1);
+   if (unlikely(newfid > ps->num_ports)) {
+   netdev_err(ds->ports[port], "all first %d FIDs are used\n",
+  ps->num_ports);
+   ret = -ENOSPC;
+   goto unlock;
+   }
+
ps->fid[port] = newfid;
-   ps->fid_mask &= ~(1 << newfid);
+   set_bit(newfid, ps->fid_bitmap);
ps->bridge_mask[fid] &= ~(1 << port);
ps->bridge_mask[newfid] = 1 << port;
 
@@ -1135,6 +1142,7 @@ int mv88e6xxx_leave_bridge(struct dsa_switch *ds, int 
port, u32 br_port_mask)
if (!ret)
ret = _mv88e6xxx_update_bridge_config(ds, newfid);
 
+unlock:
mutex_unlock(&ps->smi_mutex);
 
return ret;
@@ -1554,9 +1562,9 @@ static int mv88e6xxx_setup_port(struct dsa_switch *ds, 
int port)
 * ports, and allow each of the 'real' ports to only talk to
 * the upstream port.
 */
-   fid = __ffs(ps->fid_mask);
+   fid = port + 1;
ps->fid[port] = fid;
-   ps->fid_mask &= ~(1 << fid);
+   set_bit(fid, ps->fid_bitmap);
 
if (!dsa_is_cpu_port(ds, port))
ps->bridge_mask[fid] = 1 << port;
@@ -1855,8 +1863,6 @@ int mv88e6xxx_setup_common(struct dsa_switch *ds)
 
ps->id = REG_READ(REG_PORT(0), PORT_SWITCH_ID) & 0xfff0;
 
-   ps->fid_mask = (1 << DSA_MAX_PORTS) - 1;
-
INIT_WORK(&ps->bridge_work, mv88e6xxx_bridge_work);
 
name = kasprintf(GFP_KERNEL, "dsa%d", ds->index);
diff --git a/drivers/net/dsa/mv88e6xxx.h b/drivers/net/dsa/mv88e6xxx.h
index 6a66b4b..6d65b99 100644
--- a/drivers/net/dsa/mv88e6xxx.h
+++ b/drivers/net/dsa/mv88e6xxx.h
@@ -11,6 +11,8 @@
 #ifndef __MV88E6XXX_H
 #define __MV88E6XXX_H
 
+#include 
+
 #ifndef UINT64_MAX
 #define UINT64_MAX (u64)(~((u64)0))
 #endif
@@ -348,9 +350,9 @@ struct mv88e6xxx_priv_state {
 
/* hw bridging */
 
-   u32 fid_mask;
-   u8 fid[DSA_MAX_PORTS];
-   u16 bridge_mask[DSA_MAX_PORTS];
+   DECLARE_BITMAP(fid_bitmap, VLAN_N_VID); /* FIDs 1 to 4095 available */
+   u16 fid[DSA_MAX_PORTS]; /* per (non-bridged) port FID */
+   u16 bridge_mask[DSA_MAX_PORTS]; /* br groups (indexed by FID) */
 
unsigned long port_state_update_mask;
u8 port_state[DSA_MAX_PORTS];
-- 
2.4.6

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next v2 3/7] net: dsa: add support for switchdev FDB objects

2015-08-05 Thread Vivien Didelot

Remove the fdb_{add,del,getnext} function pointer in favor of new
port_fdb_{add,del,getnext}.

Implement the switchdev_port_obj_{add,del,dump} functions in DSA to
support the SWITCHDEV_OBJ_PORT_FDB objects.

Signed-off-by: Vivien Didelot 
---
 drivers/net/dsa/mv88e6171.c |   3 -
 drivers/net/dsa/mv88e6352.c |   3 -
 include/net/dsa.h   |  16 ++--
 net/dsa/slave.c | 218 +++-
 4 files changed, 126 insertions(+), 114 deletions(-)

diff --git a/drivers/net/dsa/mv88e6171.c b/drivers/net/dsa/mv88e6171.c
index 1c78084..cfa21ed 100644
--- a/drivers/net/dsa/mv88e6171.c
+++ b/drivers/net/dsa/mv88e6171.c
@@ -116,9 +116,6 @@ struct dsa_switch_driver mv88e6171_switch_driver = {
.port_join_bridge   = mv88e6xxx_join_bridge,
.port_leave_bridge  = mv88e6xxx_leave_bridge,
.port_stp_update= mv88e6xxx_port_stp_update,
-   .fdb_add= mv88e6xxx_port_fdb_add,
-   .fdb_del= mv88e6xxx_port_fdb_del,
-   .fdb_getnext= mv88e6xxx_port_fdb_getnext,
 };
 
 MODULE_ALIAS("platform:mv88e6171");
diff --git a/drivers/net/dsa/mv88e6352.c b/drivers/net/dsa/mv88e6352.c
index af210ef..eb4630f 100644
--- a/drivers/net/dsa/mv88e6352.c
+++ b/drivers/net/dsa/mv88e6352.c
@@ -341,9 +341,6 @@ struct dsa_switch_driver mv88e6352_switch_driver = {
.port_join_bridge   = mv88e6xxx_join_bridge,
.port_leave_bridge  = mv88e6xxx_leave_bridge,
.port_stp_update= mv88e6xxx_port_stp_update,
-   .fdb_add= mv88e6xxx_port_fdb_add,
-   .fdb_del= mv88e6xxx_port_fdb_del,
-   .fdb_getnext= mv88e6xxx_port_fdb_getnext,
 };
 
 MODULE_ALIAS("platform:mv88e6172");
diff --git a/include/net/dsa.h b/include/net/dsa.h
index fbca63b..091d35f 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -296,12 +296,16 @@ struct dsa_switch_driver {
 u32 br_port_mask);
int (*port_stp_update)(struct dsa_switch *ds, int port,
   u8 state);
-   int (*fdb_add)(struct dsa_switch *ds, int port,
-  const unsigned char *addr, u16 vid);
-   int (*fdb_del)(struct dsa_switch *ds, int port,
-  const unsigned char *addr, u16 vid);
-   int (*fdb_getnext)(struct dsa_switch *ds, int port,
-  unsigned char *addr, bool *is_static);
+
+   /*
+* Forwarding database
+*/
+   int (*port_fdb_add)(struct dsa_switch *ds, int port, u16 vid,
+   const u8 addr[ETH_ALEN]);
+   int (*port_fdb_del)(struct dsa_switch *ds, int port, u16 vid,
+   const u8 addr[ETH_ALEN]);
+   int (*port_fdb_getnext)(struct dsa_switch *ds, int port, u16 *vid,
+   u8 addr[ETH_ALEN], bool *is_static);
 };
 
 void register_switch_driver(struct dsa_switch_driver *type);
diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index 0010c69..1dbdeaa 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "dsa_priv.h"
 
 /* slave mii_bus handling ***/
@@ -200,105 +201,6 @@ out:
return 0;
 }
 
-static int dsa_slave_fdb_add(struct ndmsg *ndm, struct nlattr *tb[],
-struct net_device *dev,
-const unsigned char *addr, u16 vid, u16 nlm_flags)
-{
-   struct dsa_slave_priv *p = netdev_priv(dev);
-   struct dsa_switch *ds = p->parent;
-   int ret = -EOPNOTSUPP;
-
-   if (ds->drv->fdb_add)
-   ret = ds->drv->fdb_add(ds, p->port, addr, vid);
-
-   return ret;
-}
-
-static int dsa_slave_fdb_del(struct ndmsg *ndm, struct nlattr *tb[],
-struct net_device *dev,
-const unsigned char *addr, u16 vid)
-{
-   struct dsa_slave_priv *p = netdev_priv(dev);
-   struct dsa_switch *ds = p->parent;
-   int ret = -EOPNOTSUPP;
-
-   if (ds->drv->fdb_del)
-   ret = ds->drv->fdb_del(ds, p->port, addr, vid);
-
-   return ret;
-}
-
-static int dsa_slave_fill_info(struct net_device *dev, struct sk_buff *skb,
-  const unsigned char *addr, u16 vid,
-  bool is_static,
-  u32 portid, u32 seq, int type,
-  unsigned int flags)
-{
-   struct nlmsghdr *nlh;
-   struct ndmsg *ndm;
-
-   nlh = nlmsg_put(skb, portid, seq, type, sizeof(*ndm), flags);
-   if (!nlh)
-   return -EMSGSIZE;
-
-   ndm = nlmsg_data(nlh);
-   ndm->ndm_family  = AF_BRIDGE;
-   ndm->ndm_pad1= 0;
-   ndm->ndm_pad2= 0;
-   ndm->ndm_flags   = NTF_EXT_LEARNED;
-   ndm->ndm_type= 0;
-   ndm->ndm_ifindex = dev-

[PATCH net-next v2 2/7] net: switchdev: support static FDB addresses

2015-08-05 Thread Vivien Didelot

This patch adds a is_static boolean to the switchdev_obj_fdb structure,
in order to set the ndm_state to either NUD_NOARP or NUD_REACHABLE.

Signed-off-by: Vivien Didelot 
---
 include/net/switchdev.h   | 1 +
 net/switchdev/switchdev.c | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/net/switchdev.h b/include/net/switchdev.h
index e90e1a0..0e296b8 100644
--- a/include/net/switchdev.h
+++ b/include/net/switchdev.h
@@ -72,6 +72,7 @@ struct switchdev_obj {
struct switchdev_obj_fdb {  /* PORT_FDB */
u8 addr[ETH_ALEN];
u16 vid;
+   bool is_static;
} fdb;
} u;
 };
diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
index 9db87a3..e9d1cac 100644
--- a/net/switchdev/switchdev.c
+++ b/net/switchdev/switchdev.c
@@ -811,7 +811,7 @@ static int switchdev_port_fdb_dump_cb(struct net_device 
*dev,
ndm->ndm_flags   = NTF_SELF;
ndm->ndm_type= 0;
ndm->ndm_ifindex = dev->ifindex;
-   ndm->ndm_state   = NUD_REACHABLE;
+   ndm->ndm_state   = obj->u.fdb.is_static ? NUD_NOARP : NUD_REACHABLE;
 
if (nla_put(dump->skb, NDA_LLADDR, ETH_ALEN, obj->u.fdb.addr))
goto nla_put_failure;
-- 
2.4.6

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next v2 5/7] net: dsa: mv88e6xxx: rename ATU MAC accessors

2015-08-05 Thread Vivien Didelot

Rename the __mv88e6xxx_{read,write}_addr functions to more explicit
_mv88e6xxx_atu_mac_{read,write} functions, which also respect the single
underscore convention used in the file (meaning SMI lock must be held).

In the meantime, define their MAC address parameters as an array of
ETH_ALEN bytes instead of a char pointer.

Signed-off-by: Vivien Didelot 
---
 drivers/net/dsa/mv88e6xxx.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx.c b/drivers/net/dsa/mv88e6xxx.c
index b051576..9dad0a7 100644
--- a/drivers/net/dsa/mv88e6xxx.c
+++ b/drivers/net/dsa/mv88e6xxx.c
@@ -1182,8 +1182,8 @@ int mv88e6xxx_port_stp_update(struct dsa_switch *ds, int 
port, u8 state)
return 0;
 }
 
-static int __mv88e6xxx_write_addr(struct dsa_switch *ds,
- const unsigned char *addr)
+static int _mv88e6xxx_atu_mac_write(struct dsa_switch *ds,
+   const u8 addr[ETH_ALEN])
 {
int i, ret;
 
@@ -1198,7 +1198,7 @@ static int __mv88e6xxx_write_addr(struct dsa_switch *ds,
return 0;
 }
 
-static int __mv88e6xxx_read_addr(struct dsa_switch *ds, unsigned char *addr)
+static int _mv88e6xxx_atu_mac_read(struct dsa_switch *ds, u8 addr[ETH_ALEN])
 {
int i, ret;
 
@@ -1225,7 +1225,7 @@ static int __mv88e6xxx_port_fdb_cmd(struct dsa_switch 
*ds, int port,
if (ret < 0)
return ret;
 
-   ret = __mv88e6xxx_write_addr(ds, addr);
+   ret = _mv88e6xxx_atu_mac_write(ds, addr);
if (ret < 0)
return ret;
 
@@ -1280,7 +1280,7 @@ static int __mv88e6xxx_port_getnext(struct dsa_switch 
*ds, int port,
if (ret < 0)
return ret;
 
-   ret = __mv88e6xxx_write_addr(ds, addr);
+   ret = _mv88e6xxx_atu_mac_write(ds, addr);
if (ret < 0)
return ret;
 
@@ -1297,7 +1297,7 @@ static int __mv88e6xxx_port_getnext(struct dsa_switch 
*ds, int port,
return -ENOENT;
} while (!(((ret >> 4) & 0xff) & (1 << port)));
 
-   ret = __mv88e6xxx_read_addr(ds, addr);
+   ret = _mv88e6xxx_atu_mac_read(ds, addr);
if (ret < 0)
return ret;
 
@@ -1661,7 +1661,7 @@ static int mv88e6xxx_atu_show_db(struct seq_file *s, 
struct dsa_switch *ds,
unsigned char addr[6];
int ret, data, state;
 
-   ret = __mv88e6xxx_write_addr(ds, bcast);
+   ret = _mv88e6xxx_atu_mac_write(ds, bcast);
if (ret < 0)
return ret;
 
@@ -1676,7 +1676,7 @@ static int mv88e6xxx_atu_show_db(struct seq_file *s, 
struct dsa_switch *ds,
state = data & GLOBAL_ATU_DATA_STATE_MASK;
if (state == GLOBAL_ATU_DATA_STATE_UNUSED)
break;
-   ret = __mv88e6xxx_read_addr(ds, addr);
+   ret = _mv88e6xxx_atu_mac_read(ds, addr);
if (ret < 0)
return ret;
mv88e6xxx_atu_show_entry(s, dbnum, addr, data);
-- 
2.4.6

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next v2 7/7] net: dsa: mv88e6xxx: rework FDB add/del operations

2015-08-05 Thread Vivien Didelot

Add a low level function for the ATU Load operation, and provide FDB add
and delete wrappers functions.

Signed-off-by: Vivien Didelot 
---
 drivers/net/dsa/mv88e6171.c |   2 +
 drivers/net/dsa/mv88e6352.c |   2 +
 drivers/net/dsa/mv88e6xxx.c | 110 +---
 drivers/net/dsa/mv88e6xxx.h |   8 ++--
 4 files changed, 80 insertions(+), 42 deletions(-)

diff --git a/drivers/net/dsa/mv88e6171.c b/drivers/net/dsa/mv88e6171.c
index b99fa50..735f04c 100644
--- a/drivers/net/dsa/mv88e6171.c
+++ b/drivers/net/dsa/mv88e6171.c
@@ -116,6 +116,8 @@ struct dsa_switch_driver mv88e6171_switch_driver = {
.port_join_bridge   = mv88e6xxx_join_bridge,
.port_leave_bridge  = mv88e6xxx_leave_bridge,
.port_stp_update= mv88e6xxx_port_stp_update,
+   .port_fdb_add   = mv88e6xxx_port_fdb_add,
+   .port_fdb_del   = mv88e6xxx_port_fdb_del,
.port_fdb_getnext   = mv88e6xxx_port_fdb_getnext,
 };
 
diff --git a/drivers/net/dsa/mv88e6352.c b/drivers/net/dsa/mv88e6352.c
index 0a77135..191fb25 100644
--- a/drivers/net/dsa/mv88e6352.c
+++ b/drivers/net/dsa/mv88e6352.c
@@ -341,6 +341,8 @@ struct dsa_switch_driver mv88e6352_switch_driver = {
.port_join_bridge   = mv88e6xxx_join_bridge,
.port_leave_bridge  = mv88e6xxx_leave_bridge,
.port_stp_update= mv88e6xxx_port_stp_update,
+   .port_fdb_add   = mv88e6xxx_port_fdb_add,
+   .port_fdb_del   = mv88e6xxx_port_fdb_del,
.port_fdb_getnext   = mv88e6xxx_port_fdb_getnext,
 };
 
diff --git a/drivers/net/dsa/mv88e6xxx.c b/drivers/net/dsa/mv88e6xxx.c
index 6cad168..39203bb 100644
--- a/drivers/net/dsa/mv88e6xxx.c
+++ b/drivers/net/dsa/mv88e6xxx.c
@@ -1214,59 +1214,42 @@ static int _mv88e6xxx_atu_mac_read(struct dsa_switch 
*ds, u8 addr[ETH_ALEN])
return 0;
 }
 
-static int __mv88e6xxx_port_fdb_cmd(struct dsa_switch *ds, int port,
-   const unsigned char *addr, int state)
+static int _mv88e6xxx_atu_load(struct dsa_switch *ds,
+  struct mv88e6xxx_atu_entry *entry)
 {
-   struct mv88e6xxx_priv_state *ps = ds_to_priv(ds);
-   u8 fid = ps->fid[port];
+   u16 reg = 0;
int ret;
 
ret = _mv88e6xxx_atu_wait(ds);
if (ret < 0)
return ret;
 
-   ret = _mv88e6xxx_atu_mac_write(ds, addr);
+   ret = _mv88e6xxx_atu_mac_write(ds, entry->mac);
if (ret < 0)
return ret;
 
-   ret = _mv88e6xxx_reg_write(ds, REG_GLOBAL, GLOBAL_ATU_DATA,
-  (0x10 << port) | state);
-   if (ret)
-   return ret;
+   if (entry->state != GLOBAL_ATU_DATA_STATE_UNUSED) {
+   unsigned int mask, shift;
 
-   ret = _mv88e6xxx_atu_cmd(ds, fid, GLOBAL_ATU_OP_LOAD_DB);
+   if (entry->trunk) {
+   reg |= GLOBAL_ATU_DATA_TRUNK;
+   mask = GLOBAL_ATU_DATA_TRUNK_ID_MASK;
+   shift = GLOBAL_ATU_DATA_TRUNK_ID_SHIFT;
+   } else {
+   mask = GLOBAL_ATU_DATA_PORT_VECTOR_MASK;
+   shift = GLOBAL_ATU_DATA_PORT_VECTOR_SHIFT;
+   }
 
-   return ret;
-}
+   reg |= (entry->portv_trunkid << shift) & mask;
+   }
 
-int mv88e6xxx_port_fdb_add(struct dsa_switch *ds, int port,
-  const unsigned char *addr, u16 vid)
-{
-   int state = is_multicast_ether_addr(addr) ?
-   GLOBAL_ATU_DATA_STATE_MC_STATIC :
-   GLOBAL_ATU_DATA_STATE_UC_STATIC;
-   struct mv88e6xxx_priv_state *ps = ds_to_priv(ds);
-   int ret;
+   reg |= entry->state & GLOBAL_ATU_DATA_STATE_MASK;
 
-   mutex_lock(&ps->smi_mutex);
-   ret = __mv88e6xxx_port_fdb_cmd(ds, port, addr, state);
-   mutex_unlock(&ps->smi_mutex);
+   ret = _mv88e6xxx_reg_write(ds, REG_GLOBAL, GLOBAL_ATU_DATA, reg);
+   if (ret < 0)
+   return ret;
 
-   return ret;
-}
-
-int mv88e6xxx_port_fdb_del(struct dsa_switch *ds, int port,
-  const unsigned char *addr, u16 vid)
-{
-   struct mv88e6xxx_priv_state *ps = ds_to_priv(ds);
-   int ret;
-
-   mutex_lock(&ps->smi_mutex);
-   ret = __mv88e6xxx_port_fdb_cmd(ds, port, addr,
-  GLOBAL_ATU_DATA_STATE_UNUSED);
-   mutex_unlock(&ps->smi_mutex);
-
-   return ret;
+   return _mv88e6xxx_atu_cmd(ds, entry->fid, GLOBAL_ATU_OP_LOAD_DB);
 }
 
 static int _mv88e6xxx_atu_getnext(struct dsa_switch *ds, u16 fid,
@@ -1329,6 +1312,57 @@ static int _mv88e6xxx_port_vid_to_fid(struct dsa_switch 
*ds, int port, u16 vid)
return -ENOENT;
 }
 
+static int _mv88e6xxx_port_fdb_load(struct dsa_switch *ds, int port, u16 vid,
+   const u8 addr[ETH_ALEN], u8 state)
+{
+   struct mv88e6xxx_atu_entry entry = { 0 };
+   int ret;
+
+   r

[PATCH net-next v2 6/7] net: dsa: mv88e6xxx: rework FDB getnext operation

2015-08-05 Thread Vivien Didelot

This commit adds a low level _mv88e6xxx_atu_getnext function and helpers
to rewrite the mv88e6xxx_port_fdb_getnext operation.

A mv88e6xxx_atu_entry structure is added for convenient access to the
hardware, and GLOBAL_ATU_FID is defined instead of the raw 0x01 value.

The previous implementation did not handle the eventual trunk mapping.
If the related bit is set, then the ATU data register would contain the
trunk ID, and not the port vector.

Check this in the FDB getnext operation and do not handle it (yet).

Signed-off-by: Vivien Didelot 
---
 drivers/net/dsa/mv88e6171.c |  1 +
 drivers/net/dsa/mv88e6352.c |  1 +
 drivers/net/dsa/mv88e6xxx.c | 97 +
 drivers/net/dsa/mv88e6xxx.h | 15 ++-
 4 files changed, 87 insertions(+), 27 deletions(-)

diff --git a/drivers/net/dsa/mv88e6171.c b/drivers/net/dsa/mv88e6171.c
index cfa21ed..b99fa50 100644
--- a/drivers/net/dsa/mv88e6171.c
+++ b/drivers/net/dsa/mv88e6171.c
@@ -116,6 +116,7 @@ struct dsa_switch_driver mv88e6171_switch_driver = {
.port_join_bridge   = mv88e6xxx_join_bridge,
.port_leave_bridge  = mv88e6xxx_leave_bridge,
.port_stp_update= mv88e6xxx_port_stp_update,
+   .port_fdb_getnext   = mv88e6xxx_port_fdb_getnext,
 };
 
 MODULE_ALIAS("platform:mv88e6171");
diff --git a/drivers/net/dsa/mv88e6352.c b/drivers/net/dsa/mv88e6352.c
index eb4630f..0a77135 100644
--- a/drivers/net/dsa/mv88e6352.c
+++ b/drivers/net/dsa/mv88e6352.c
@@ -341,6 +341,7 @@ struct dsa_switch_driver mv88e6352_switch_driver = {
.port_join_bridge   = mv88e6xxx_join_bridge,
.port_leave_bridge  = mv88e6xxx_leave_bridge,
.port_stp_update= mv88e6xxx_port_stp_update,
+   .port_fdb_getnext   = mv88e6xxx_port_fdb_getnext,
 };
 
 MODULE_ALIAS("platform:mv88e6172");
diff --git a/drivers/net/dsa/mv88e6xxx.c b/drivers/net/dsa/mv88e6xxx.c
index 9dad0a7..6cad168 100644
--- a/drivers/net/dsa/mv88e6xxx.c
+++ b/drivers/net/dsa/mv88e6xxx.c
@@ -964,7 +964,7 @@ static int _mv88e6xxx_atu_cmd(struct dsa_switch *ds, int 
fid, u16 cmd)
 {
int ret;
 
-   ret = _mv88e6xxx_reg_write(ds, REG_GLOBAL, 0x01, fid);
+   ret = _mv88e6xxx_reg_write(ds, REG_GLOBAL, GLOBAL_ATU_FID, fid);
if (ret < 0)
return ret;
 
@@ -1269,12 +1269,14 @@ int mv88e6xxx_port_fdb_del(struct dsa_switch *ds, int 
port,
return ret;
 }
 
-static int __mv88e6xxx_port_getnext(struct dsa_switch *ds, int port,
-   unsigned char *addr, bool *is_static)
+static int _mv88e6xxx_atu_getnext(struct dsa_switch *ds, u16 fid,
+ const u8 addr[ETH_ALEN],
+ struct mv88e6xxx_atu_entry *entry)
 {
-   struct mv88e6xxx_priv_state *ps = ds_to_priv(ds);
-   u8 fid = ps->fid[port];
-   int ret, state;
+   struct mv88e6xxx_atu_entry next = { 0 };
+   int ret;
+
+   next.fid = fid;
 
ret = _mv88e6xxx_atu_wait(ds);
if (ret < 0)
@@ -1284,39 +1286,84 @@ static int __mv88e6xxx_port_getnext(struct dsa_switch 
*ds, int port,
if (ret < 0)
return ret;
 
-   do {
-   ret = _mv88e6xxx_atu_cmd(ds, fid,  GLOBAL_ATU_OP_GET_NEXT_DB);
-   if (ret < 0)
-   return ret;
+   ret = _mv88e6xxx_atu_cmd(ds, fid, GLOBAL_ATU_OP_GET_NEXT_DB);
+   if (ret < 0)
+   return ret;
 
-   ret = _mv88e6xxx_reg_read(ds, REG_GLOBAL, GLOBAL_ATU_DATA);
-   if (ret < 0)
-   return ret;
-   state = ret & GLOBAL_ATU_DATA_STATE_MASK;
-   if (state == GLOBAL_ATU_DATA_STATE_UNUSED)
-   return -ENOENT;
-   } while (!(((ret >> 4) & 0xff) & (1 << port)));
+   ret = _mv88e6xxx_atu_mac_read(ds, next.mac);
+   if (ret < 0)
+   return ret;
 
-   ret = _mv88e6xxx_atu_mac_read(ds, addr);
+   ret = _mv88e6xxx_reg_read(ds, REG_GLOBAL, GLOBAL_ATU_DATA);
if (ret < 0)
return ret;
 
-   *is_static = state == (is_multicast_ether_addr(addr) ?
-  GLOBAL_ATU_DATA_STATE_MC_STATIC :
-  GLOBAL_ATU_DATA_STATE_UC_STATIC);
+   next.state = ret & GLOBAL_ATU_DATA_STATE_MASK;
+   if (next.state != GLOBAL_ATU_DATA_STATE_UNUSED) {
+   unsigned int mask, shift;
+
+   if (ret & GLOBAL_ATU_DATA_TRUNK) {
+   next.trunk = true;
+   mask = GLOBAL_ATU_DATA_TRUNK_ID_MASK;
+   shift = GLOBAL_ATU_DATA_TRUNK_ID_SHIFT;
+   } else {
+   next.trunk = false;
+   mask = GLOBAL_ATU_DATA_PORT_VECTOR_MASK;
+   shift = GLOBAL_ATU_DATA_PORT_VECTOR_SHIFT;
+   }
+
+   next.portv_trunkid = (ret & mask) >> shift;
+   }
 
+   *entry = next;
return 0;
 }
 
-/* ge

Re: rtnl_mutex deadlock?

2015-08-05 Thread Herbert Xu

On Wed, Aug 05, 2015 at 08:59:07PM +0200, Daniel Borkmann wrote:
> 
> Here's a theory and patch below. Herbert, Thomas, does this make any
> sense to you resp. sound plausible? ;)

Another possibility is the following bug:

https://patchwork.ozlabs.org/patch/503374/

It can cause a use-after-free which may lead to corruption of skb
state, including the cb buffer.  Of course it's a long shot.

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH V4 4/7] Drivers: hv: vmbus: add APIs to register callbacks to process hvsock connection

2015-08-05 Thread Dexuan Cui

> From: devel [mailto:driverdev-devel-boun...@linuxdriverproject.org] On Behalf
> Of Dexuan Cui
> Sent: Thursday, July 30, 2015 18:20
> To: David Miller ; KY Srinivasan 
> Cc: o...@aepfle.de; gre...@linuxfoundation.org; jasow...@redhat.com;
> driverdev-de...@linuxdriverproject.org; linux-ker...@vger.kernel.org;
> step...@networkplumber.org; stefa...@redhat.com; netdev@vger.kernel.org;
> a...@canonical.com; pebo...@tiscali.nl; dan.carpen...@oracle.com
> Subject: RE: [PATCH V4 4/7] Drivers: hv: vmbus: add APIs to register 
> callbacks to
> process hvsock connection
> 
> > From: David Miller
> > Sent: Thursday, July 30, 2015 6:27
> >
> > From: Dexuan Cui
> > Date: Tue, 28 Jul 2015 05:35:11 -0700
> >
> > > With the 2 APIs supplied by the VMBus driver, the coming net/hvsock driver
> > > can register 2 callbacks and can know when a new hvsock connection is
> > > offered by the host, and when a hvsock connection is being closed by the
> > > host.
> > >
> > This is an extremely terrible interface.
> >
> > It's an opaque hook that allows on registry, and it's solve purpose
> > is to allow a backdoor call into a foreign driver in another module.
> >
> > These are exactly the things we try to avoid.
> 
> Hi David,
> Thanks a lot for your reviewing and the suggestion!
> 
> > Why not create a real abstraction where clients register an object,
> > that can be contained as a sub-member inside of their own driver
> > private, that provides the callback registry mechanism.

Hi David,
Can you please have a look at my below questions?

I like your idea of a real abstraction. Your answer would definitely
help me to implement that correctly. 

> Please pardon me for my inexperience.
> Can you please be a bit more specific?
> I guess maybe you're referencing a common design pattern in the driver
> code, so an example in some existing driver would be the best. :-)
> 
> "clients register an object " --
> does the "clients" mean the hvsock driver?
> and the "object" means the 2 callbacks?
> 
> IMHO, here the vmbus driver has to synchronously pass the 2 events
> to the hvsock driver, so a "backdoor call into the hvsock driver" is
> inevitable anyway?
> 
> e.g., in the path vmbus_process_offer() -> hvsock_process_offer(), the
> return value of the latter is important to the former, because on error
> the former needs to clean up some internal states of the vmbus driver (that
> is, the "goto err_deq_chan").
> 
> 
> > That way you can register multiple clients, do things like allow
> > AF_PACKET capturing of vmbus traffic, etc.
> 
> I thought AF_PACKET can only capture IP packets   or Ethernet frames.
> Can it be used to capture AF_UNIX packet?
> If yes, I suppose we can consider making it work for AF_HYPERV too,
> if people ask for that.
> 
> -- Dexuan

Thanks,
-- Dexuan
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH for-next V7 00/10] Move RoCE GID management to IB/Core

2015-08-05 Thread Jason Gunthorpe

On Sun, Aug 02, 2015 at 10:56:38AM +0300, Matan Barak wrote:

> Indeed this design flaw was introduced when the first legacy verb was
> extended. I think that falling back from extended code to legacy code
> should be in the uverbs code. ib_uverbs_write will return -ENOSYS only
> if both extended and non-extended don't exist. The uverbs command itself
> will call the non-extended form if the comp_mask is zero and all
> data between legacy size and the given size are zero as well.
> What do you think?

Yes, that is what I had expected from day 1.. Didn't notice it in the
first introduction.

Thanks,
Jason
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH V4 7/7] Drivers: hv: vmbus: disable local interrupt when hvsock's callback is running

2015-08-05 Thread Dexuan Cui

> -Original Message-
> From: devel [mailto:driverdev-devel-boun...@linuxdriverproject.org] On Behalf
> Of Dexuan Cui
> Sent: Thursday, July 30, 2015 18:18
> To: David Miller ; KY Srinivasan 
> Cc: o...@aepfle.de; gre...@linuxfoundation.org; jasow...@redhat.com;
> driverdev-de...@linuxdriverproject.org; linux-ker...@vger.kernel.org;
> step...@networkplumber.org; stefa...@redhat.com; netdev@vger.kernel.org;
> a...@canonical.com; pebo...@tiscali.nl; dan.carpen...@oracle.com
> Subject: RE: [PATCH V4 7/7] Drivers: hv: vmbus: disable local interrupt when
> hvsock's callback is running
> 
> > From: David Miller
> > Sent: Thursday, July 30, 2015 6:28
> > > From: Dexuan Cui 
> > > Date: Tue, 28 Jul 2015 05:35:30 -0700
> > >
> > > In the SMP guest case, when the per-channel callback hvsock_events() is
> > > running on virtual CPU A, if the guest tries to close the connection on
> > > virtual CPU B: we invoke vmbus_close() -> vmbus_close_internal(),
> > > then we can have trouble: on B, vmbus_close_internal() will send IPI
> > > reset_channel_cb() to A, trying to set channel->onchannel_callbackto NULL;
> > > on A, if the IPI handler happens between
> > > "if (channel->onchannel_callback != NULL)" and invoking
> > > channel->onchannel_callback, we'll invoke a function pointer of NULL.
> > >
> > > This is why the patch is necessary.
> > >
> > Sorry, I do not accept that you must use conditional locking and/or
> > IRQ disabling.
> >
> > Boil it down to what is necessary for the least common denominator,
> > and use that unconditionally.
> 
> Hi David,
> Thanks for the comment!
> 
> I agree with you it's not clean to use conditional IRQ disabling.
> 
> Here I didn't use unconditionally IRQ disabling because the Hyper-V netvsc
> and storvsc driver's vmbus event callbacks (i.e. netvsc_channel_cb() and
> storvsc_on_channel_callback()) may take relatively long time (e.g., netvsc can
> operate at a speed of 10Gb) and I think it's bad to disable IRQ for long time
> when the callbacks are running in a tasklet context, e.g., the Hyper-V timer
> can be affected: see vmbus_isr() -> hv_process_timer_expiration().
> 
> To resolve the race condition between vmbus_close_internal() and
> process_chn_event() in SMP case, now I propose a new method:
> 
> we can serialize the 2 paths by adding
> tasklet_disable(hv_context.event_dpc[channel->target_cpu]) and
> tasklet_enable(...) in vmbus_close_internal().
> 
> In this way, we need the least change and we can drop this patch.
> 
> Please let me know your opinion.
> 
> -- Dexuan

Hi David, KY and all,

May I know your opinion about my idea of adding tasklet_disable/enbable()
in vmbus_close_internal() and dropping this patch?

Thanks,
-- Dexuan
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next v4 2/4] ip_gre: Add support to collect tunnel metadata.

2015-08-05 Thread Pravin B Shelar

Following patch create new tunnel flag which enable
tunnel metadata collection on given device.

Signed-off-by: Pravin B Shelar 
---
 include/net/ip_tunnels.h   |   7 +-
 include/uapi/linux/if_tunnel.h |   1 +
 net/ipv4/ip_gre.c  | 195 +
 net/ipv4/ip_tunnel.c   |  37 ++--
 net/ipv4/ipip.c|   2 +-
 net/ipv6/sit.c |   2 +-
 6 files changed, 216 insertions(+), 28 deletions(-)

diff --git a/include/net/ip_tunnels.h b/include/net/ip_tunnels.h
index 4798441..984dbfa 100644
--- a/include/net/ip_tunnels.h
+++ b/include/net/ip_tunnels.h
@@ -82,6 +82,8 @@ struct ip_tunnel_dst {
__be32   saddr;
 };
 
+struct metadata_dst;
+
 struct ip_tunnel {
struct ip_tunnel __rcu  *next;
struct hlist_node hash_node;
@@ -115,6 +117,7 @@ struct ip_tunnel {
unsigned intprl_count;  /* # of entries in PRL */
int ip_tnl_net_id;
struct gro_cellsgro_cells;
+   boolcollect_md;
 };
 
 #define TUNNEL_CSUM__cpu_to_be16(0x01)
@@ -149,6 +152,7 @@ struct tnl_ptk_info {
 struct ip_tunnel_net {
struct net_device *fb_tunnel_dev;
struct hlist_head tunnels[IP_TNL_HASH_SIZE];
+   struct ip_tunnel __rcu *collect_md_tun;
 };
 
 struct ip_tunnel_encap_ops {
@@ -235,7 +239,8 @@ struct ip_tunnel *ip_tunnel_lookup(struct ip_tunnel_net 
*itn,
   __be32 key);
 
 int ip_tunnel_rcv(struct ip_tunnel *tunnel, struct sk_buff *skb,
- const struct tnl_ptk_info *tpi, bool log_ecn_error);
+ const struct tnl_ptk_info *tpi, struct metadata_dst *tun_dst,
+ bool log_ecn_error);
 int ip_tunnel_changelink(struct net_device *dev, struct nlattr *tb[],
 struct ip_tunnel_parm *p);
 int ip_tunnel_newlink(struct net_device *dev, struct nlattr *tb[],
diff --git a/include/uapi/linux/if_tunnel.h b/include/uapi/linux/if_tunnel.h
index bd3cc11..af4de90 100644
--- a/include/uapi/linux/if_tunnel.h
+++ b/include/uapi/linux/if_tunnel.h
@@ -112,6 +112,7 @@ enum {
IFLA_GRE_ENCAP_FLAGS,
IFLA_GRE_ENCAP_SPORT,
IFLA_GRE_ENCAP_DPORT,
+   IFLA_GRE_COLLECT_METADATA,
__IFLA_GRE_MAX,
 };
 
diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
index 5fd7064..554a760 100644
--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -25,6 +25,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -47,6 +48,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #if IS_ENABLED(CONFIG_IPV6)
 #include 
@@ -200,9 +202,29 @@ static int ipgre_err(struct sk_buff *skb, u32 info,
return PACKET_RCVD;
 }
 
+static __be64 key_to_tunnel_id(__be32 key)
+{
+#ifdef __BIG_ENDIAN
+   return (__force __be64)((__force u32)key);
+#else
+   return (__force __be64)((__force u64)key << 32);
+#endif
+}
+
+/* Returns the least-significant 32 bits of a __be64. */
+static __be32 tunnel_id_to_key(__be64 x)
+{
+#ifdef __BIG_ENDIAN
+   return (__force __be32)x;
+#else
+   return (__force __be32)((__force u64)x >> 32);
+#endif
+}
+
 static int ipgre_rcv(struct sk_buff *skb, const struct tnl_ptk_info *tpi)
 {
struct net *net = dev_net(skb->dev);
+   struct metadata_dst *tun_dst = NULL;
struct ip_tunnel_net *itn;
const struct iphdr *iph;
struct ip_tunnel *tunnel;
@@ -218,40 +240,162 @@ static int ipgre_rcv(struct sk_buff *skb, const struct 
tnl_ptk_info *tpi)
 
if (tunnel) {
skb_pop_mac_header(skb);
-   ip_tunnel_rcv(tunnel, skb, tpi, log_ecn_error);
+   if (tunnel->collect_md) {
+   struct ip_tunnel_info *info;
+
+   tun_dst = metadata_dst_alloc(0, GFP_ATOMIC);
+   if (!tun_dst)
+   return PACKET_REJECT;
+
+   info = &tun_dst->u.tun_info;
+   info->key.ipv4_src = iph->saddr;
+   info->key.ipv4_dst = iph->daddr;
+   info->key.ipv4_tos = iph->tos;
+   info->key.ipv4_ttl = iph->ttl;
+
+   info->mode = IP_TUNNEL_INFO_RX;
+   info->key.tun_flags = tpi->flags &
+ (TUNNEL_CSUM | TUNNEL_KEY);
+   info->key.tun_id = key_to_tunnel_id(tpi->key);
+
+   info->key.tp_src = 0;
+   info->key.tp_dst = 0;
+   }
+
+   ip_tunnel_rcv(tunnel, skb, tpi, tun_dst, log_ecn_error);
return PACKET_RCVD;
}
return PACKET_REJECT;
 }
 
+static void build_header(struct sk_buff *skb, int hdr_len, __be16 flags,
+__be16 proto, __be32 key, __be32 seq)
+{
+   struct gre_base_hdr *greh;
+
+   skb_push(skb, hdr_len);
+
+   skb_reset_t

[PATCH net-next v4 4/4] gre: Remove support for sharing GRE protocol hook.

2015-08-05 Thread Pravin B Shelar

Support for sharing GREPROTO_CISCO port was added so that
OVS gre port and kernel GRE devices can co-exist. After
flow-based tunneling patches OVS GRE protocol processing
is completely moved to ip_gre module. so there is no need
for GRE protocol hook. Following patch consolidates
GRE protocol related functions into ip_gre module.

Signed-off-by: Pravin B Shelar 
---
 include/net/gre.h|  80 ++-
 net/ipv4/gre_demux.c | 201 +--
 net/ipv4/ip_gre.c| 215 +++
 3 files changed, 206 insertions(+), 290 deletions(-)

diff --git a/include/net/gre.h b/include/net/gre.h
index e3e0845..97eafdc 100644
--- a/include/net/gre.h
+++ b/include/net/gre.h
@@ -4,6 +4,12 @@
 #include 
 #include 
 
+struct gre_base_hdr {
+   __be16 flags;
+   __be16 protocol;
+};
+#define GRE_HEADER_SECTION 4
+
 #define GREPROTO_CISCO 0
 #define GREPROTO_PPTP  1
 #define GREPROTO_MAX   2
@@ -14,83 +20,9 @@ struct gre_protocol {
void (*err_handler)(struct sk_buff *skb, u32 info);
 };
 
-struct gre_base_hdr {
-   __be16 flags;
-   __be16 protocol;
-};
-#define GRE_HEADER_SECTION 4
-
 int gre_add_protocol(const struct gre_protocol *proto, u8 version);
 int gre_del_protocol(const struct gre_protocol *proto, u8 version);
 
-struct gre_cisco_protocol {
-   int (*handler)(struct sk_buff *skb, const struct tnl_ptk_info *tpi);
-   int (*err_handler)(struct sk_buff *skb, u32 info,
-  const struct tnl_ptk_info *tpi);
-   u8 priority;
-};
-
-int gre_cisco_register(struct gre_cisco_protocol *proto);
-int gre_cisco_unregister(struct gre_cisco_protocol *proto);
-
 struct net_device *gretap_fb_dev_create(struct net *net, const char *name,
   u8 name_assign_type);
-
-static inline int ip_gre_calc_hlen(__be16 o_flags)
-{
-   int addend = 4;
-
-   if (o_flags&TUNNEL_CSUM)
-   addend += 4;
-   if (o_flags&TUNNEL_KEY)
-   addend += 4;
-   if (o_flags&TUNNEL_SEQ)
-   addend += 4;
-   return addend;
-}
-
-static inline __be16 gre_flags_to_tnl_flags(__be16 flags)
-{
-   __be16 tflags = 0;
-
-   if (flags & GRE_CSUM)
-   tflags |= TUNNEL_CSUM;
-   if (flags & GRE_ROUTING)
-   tflags |= TUNNEL_ROUTING;
-   if (flags & GRE_KEY)
-   tflags |= TUNNEL_KEY;
-   if (flags & GRE_SEQ)
-   tflags |= TUNNEL_SEQ;
-   if (flags & GRE_STRICT)
-   tflags |= TUNNEL_STRICT;
-   if (flags & GRE_REC)
-   tflags |= TUNNEL_REC;
-   if (flags & GRE_VERSION)
-   tflags |= TUNNEL_VERSION;
-
-   return tflags;
-}
-
-static inline __be16 tnl_flags_to_gre_flags(__be16 tflags)
-{
-   __be16 flags = 0;
-
-   if (tflags & TUNNEL_CSUM)
-   flags |= GRE_CSUM;
-   if (tflags & TUNNEL_ROUTING)
-   flags |= GRE_ROUTING;
-   if (tflags & TUNNEL_KEY)
-   flags |= GRE_KEY;
-   if (tflags & TUNNEL_SEQ)
-   flags |= GRE_SEQ;
-   if (tflags & TUNNEL_STRICT)
-   flags |= GRE_STRICT;
-   if (tflags & TUNNEL_REC)
-   flags |= GRE_REC;
-   if (tflags & TUNNEL_VERSION)
-   flags |= GRE_VERSION;
-
-   return flags;
-}
-
 #endif
diff --git a/net/ipv4/gre_demux.c b/net/ipv4/gre_demux.c
index 77562e0..d9c552a 100644
--- a/net/ipv4/gre_demux.c
+++ b/net/ipv4/gre_demux.c
@@ -31,7 +31,6 @@
 #include 
 
 static const struct gre_protocol __rcu *gre_proto[GREPROTO_MAX] __read_mostly;
-static struct gre_cisco_protocol __rcu *gre_cisco_proto_list[GRE_IP_PROTO_MAX];
 
 int gre_add_protocol(const struct gre_protocol *proto, u8 version)
 {
@@ -61,163 +60,6 @@ int gre_del_protocol(const struct gre_protocol *proto, u8 
version)
 }
 EXPORT_SYMBOL_GPL(gre_del_protocol);
 
-static int parse_gre_header(struct sk_buff *skb, struct tnl_ptk_info *tpi,
-   bool *csum_err)
-{
-   const struct gre_base_hdr *greh;
-   __be32 *options;
-   int hdr_len;
-
-   if (unlikely(!pskb_may_pull(skb, sizeof(struct gre_base_hdr
-   return -EINVAL;
-
-   greh = (struct gre_base_hdr *)skb_transport_header(skb);
-   if (unlikely(greh->flags & (GRE_VERSION | GRE_ROUTING)))
-   return -EINVAL;
-
-   tpi->flags = gre_flags_to_tnl_flags(greh->flags);
-   hdr_len = ip_gre_calc_hlen(tpi->flags);
-
-   if (!pskb_may_pull(skb, hdr_len))
-   return -EINVAL;
-
-   greh = (struct gre_base_hdr *)skb_transport_header(skb);
-   tpi->proto = greh->protocol;
-
-   options = (__be32 *)(greh + 1);
-   if (greh->flags & GRE_CSUM) {
-   if (skb_checksum_simple_validate(skb)) {
-   *csum_err = true;
-   return -EINVAL;
-   }
-
-   skb_checksum_try_convert(skb, IPPROTO_GR

[PATCH net-next v4 1/4] openvswitch: Move tunnel destroy function to oppenvswitch module.

2015-08-05 Thread Pravin B Shelar

This function will be used in gre and geneve vport implementations.

Signed-off-by: Pravin B Shelar 
---
 net/openvswitch/vport-netdev.c | 21 ++---
 net/openvswitch/vport-netdev.h |  2 +-
 net/openvswitch/vport-vxlan.c  | 17 +
 3 files changed, 20 insertions(+), 20 deletions(-)

diff --git a/net/openvswitch/vport-netdev.c b/net/openvswitch/vport-netdev.c
index cddb706..4b70aaa 100644
--- a/net/openvswitch/vport-netdev.c
+++ b/net/openvswitch/vport-netdev.c
@@ -147,7 +147,7 @@ static struct vport *netdev_create(const struct vport_parms 
*parms)
return ovs_netdev_link(vport, parms->name);
 }
 
-void ovs_vport_free_rcu(struct rcu_head *rcu)
+static void vport_netdev_free(struct rcu_head *rcu)
 {
struct vport *vport = container_of(rcu, struct vport, rcu);
 
@@ -155,7 +155,6 @@ void ovs_vport_free_rcu(struct rcu_head *rcu)
dev_put(vport->dev);
ovs_vport_free(vport);
 }
-EXPORT_SYMBOL_GPL(ovs_vport_free_rcu);
 
 void ovs_netdev_detach_dev(struct vport *vport)
 {
@@ -175,9 +174,25 @@ static void netdev_destroy(struct vport *vport)
ovs_netdev_detach_dev(vport);
rtnl_unlock();
 
-   call_rcu(&vport->rcu, ovs_vport_free_rcu);
+   call_rcu(&vport->rcu, vport_netdev_free);
 }
 
+void ovs_netdev_tunnel_destroy(struct vport *vport)
+{
+   rtnl_lock();
+   if (vport->dev->priv_flags & IFF_OVS_DATAPATH)
+   ovs_netdev_detach_dev(vport);
+
+   /* Early release so we can unregister the device */
+   dev_put(vport->dev);
+   rtnl_delete_link(vport->dev);
+   vport->dev = NULL;
+   rtnl_unlock();
+
+   call_rcu(&vport->rcu, vport_netdev_free);
+}
+EXPORT_SYMBOL_GPL(ovs_netdev_tunnel_destroy);
+
 static unsigned int packet_length(const struct sk_buff *skb)
 {
unsigned int length = skb->len - ETH_HLEN;
diff --git a/net/openvswitch/vport-netdev.h b/net/openvswitch/vport-netdev.h
index 8044126..497cc81 100644
--- a/net/openvswitch/vport-netdev.h
+++ b/net/openvswitch/vport-netdev.h
@@ -29,9 +29,9 @@ struct vport *ovs_netdev_get_vport(struct net_device *dev);
 struct vport *ovs_netdev_link(struct vport *vport, const char *name);
 int ovs_netdev_send(struct vport *vport, struct sk_buff *skb);
 void ovs_netdev_detach_dev(struct vport *);
-void ovs_vport_free_rcu(struct rcu_head *);
 
 int __init ovs_netdev_init(void);
 void ovs_netdev_exit(void);
 
+void ovs_netdev_tunnel_destroy(struct vport *vport);
 #endif /* vport_netdev.h */
diff --git a/net/openvswitch/vport-vxlan.c b/net/openvswitch/vport-vxlan.c
index 5471733..d8d0384 100644
--- a/net/openvswitch/vport-vxlan.c
+++ b/net/openvswitch/vport-vxlan.c
@@ -146,21 +146,6 @@ static struct vport *vxlan_create(const struct vport_parms 
*parms)
return ovs_netdev_link(vport, parms->name);
 }
 
-static void vxlan_destroy(struct vport *vport)
-{
-   rtnl_lock();
-   if (vport->dev->priv_flags & IFF_OVS_DATAPATH)
-   ovs_netdev_detach_dev(vport);
-
-   /* Early release so we can unregister the device */
-   dev_put(vport->dev);
-   rtnl_delete_link(vport->dev);
-   vport->dev = NULL;
-   rtnl_unlock();
-
-   call_rcu(&vport->rcu, ovs_vport_free_rcu);
-}
-
 static int vxlan_get_egress_tun_info(struct vport *vport, struct sk_buff *skb,
 struct ip_tunnel_info *egress_tun_info)
 {
@@ -183,7 +168,7 @@ static int vxlan_get_egress_tun_info(struct vport *vport, 
struct sk_buff *skb,
 static struct vport_ops ovs_vxlan_netdev_vport_ops = {
.type   = OVS_VPORT_TYPE_VXLAN,
.create = vxlan_create,
-   .destroy= vxlan_destroy,
+   .destroy= ovs_netdev_tunnel_destroy,
.get_options= vxlan_get_options,
.send   = ovs_netdev_send,
.get_egress_tun_info= vxlan_get_egress_tun_info,
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next v4 3/4] openvswitch: Use regular GRE net_device instead of vport

2015-08-05 Thread Pravin B Shelar

Using flow based tunneling, we can implement
OVS GRE vport. This patch removes all of the OVS
specific GRE code and make OVS use a ip_gre net_device.
Minimal GRE vport is kept to handle compatibility with
current userspace application.

Signed-off-by: Pravin B Shelar 
---
 include/net/gre.h   |  12 +--
 net/ipv4/gre_demux.c|  34 ---
 net/ipv4/ip_gre.c   |  36 +++
 net/openvswitch/Kconfig |   1 -
 net/openvswitch/vport-gre.c | 237 
 5 files changed, 60 insertions(+), 260 deletions(-)

diff --git a/include/net/gre.h b/include/net/gre.h
index b531820..e3e0845 100644
--- a/include/net/gre.h
+++ b/include/net/gre.h
@@ -33,16 +33,8 @@ struct gre_cisco_protocol {
 int gre_cisco_register(struct gre_cisco_protocol *proto);
 int gre_cisco_unregister(struct gre_cisco_protocol *proto);
 
-void gre_build_header(struct sk_buff *skb, const struct tnl_ptk_info *tpi,
- int hdr_len);
-
-static inline struct sk_buff *gre_handle_offloads(struct sk_buff *skb,
- bool csum)
-{
-   return iptunnel_handle_offloads(skb, csum,
-   csum ? SKB_GSO_GRE_CSUM : SKB_GSO_GRE);
-}
-
+struct net_device *gretap_fb_dev_create(struct net *net, const char *name,
+  u8 name_assign_type);
 
 static inline int ip_gre_calc_hlen(__be16 o_flags)
 {
diff --git a/net/ipv4/gre_demux.c b/net/ipv4/gre_demux.c
index 4a7b5b2..77562e0 100644
--- a/net/ipv4/gre_demux.c
+++ b/net/ipv4/gre_demux.c
@@ -61,40 +61,6 @@ int gre_del_protocol(const struct gre_protocol *proto, u8 
version)
 }
 EXPORT_SYMBOL_GPL(gre_del_protocol);
 
-void gre_build_header(struct sk_buff *skb, const struct tnl_ptk_info *tpi,
- int hdr_len)
-{
-   struct gre_base_hdr *greh;
-
-   skb_push(skb, hdr_len);
-
-   skb_reset_transport_header(skb);
-   greh = (struct gre_base_hdr *)skb->data;
-   greh->flags = tnl_flags_to_gre_flags(tpi->flags);
-   greh->protocol = tpi->proto;
-
-   if (tpi->flags&(TUNNEL_KEY|TUNNEL_CSUM|TUNNEL_SEQ)) {
-   __be32 *ptr = (__be32 *)(((u8 *)greh) + hdr_len - 4);
-
-   if (tpi->flags&TUNNEL_SEQ) {
-   *ptr = tpi->seq;
-   ptr--;
-   }
-   if (tpi->flags&TUNNEL_KEY) {
-   *ptr = tpi->key;
-   ptr--;
-   }
-   if (tpi->flags&TUNNEL_CSUM &&
-   !(skb_shinfo(skb)->gso_type &
- (SKB_GSO_GRE|SKB_GSO_GRE_CSUM))) {
-   *ptr = 0;
-   *(__sum16 *)ptr = csum_fold(skb_checksum(skb, 0,
-skb->len, 0));
-   }
-   }
-}
-EXPORT_SYMBOL_GPL(gre_build_header);
-
 static int parse_gre_header(struct sk_buff *skb, struct tnl_ptk_info *tpi,
bool *csum_err)
 {
diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
index 554a760..49d1402 100644
--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -318,6 +318,13 @@ static void __gre_xmit(struct sk_buff *skb, struct 
net_device *dev,
ip_tunnel_xmit(skb, dev, tnl_params, tnl_params->protocol);
 }
 
+static struct sk_buff *gre_handle_offloads(struct sk_buff *skb,
+  bool csum)
+{
+   return iptunnel_handle_offloads(skb, csum,
+   csum ? SKB_GSO_GRE_CSUM : SKB_GSO_GRE);
+}
+
 static void gre_fb_xmit(struct sk_buff *skb, struct net_device *dev)
 {
struct ip_tunnel_info *tun_info;
@@ -1012,6 +1019,35 @@ static struct rtnl_link_ops ipgre_tap_ops __read_mostly 
= {
.get_link_net   = ip_tunnel_get_link_net,
 };
 
+struct net_device *gretap_fb_dev_create(struct net *net, const char *name,
+   u8 name_assign_type)
+{
+   struct nlattr *tb[IFLA_MAX + 1];
+   struct net_device *dev;
+   struct ip_tunnel *t;
+   int err;
+
+   memset(&tb, 0, sizeof(tb));
+
+   dev = rtnl_create_link(net, name, name_assign_type,
+  &ipgre_tap_ops, tb);
+   if (IS_ERR(dev))
+   return dev;
+
+   /* Configure flow based GRE device. */
+   t = netdev_priv(dev);
+   t->collect_md = true;
+
+   err = ipgre_newlink(net, dev, tb, NULL);
+   if (err < 0)
+   goto out;
+   return dev;
+out:
+   free_netdev(dev);
+   return ERR_PTR(err);
+}
+EXPORT_SYMBOL_GPL(gretap_fb_dev_create);
+
 static int __net_init ipgre_tap_init_net(struct net *net)
 {
return ip_tunnel_init_net(net, gre_tap_net_id, &ipgre_tap_ops, 
"gretap0");
diff --git a/net/openvswitch/Kconfig b/net/openvswitch/Kconfig
index 1584040..c56f4d4 100644
--- a/net/openvswitch/Kconfig
+++ b/net/openvswitch/Kconfig
@@ -34,7 +34,6 @@ config OPENVSWITCH
 config OPENVSWITCH_GRE

[PATCH net-next v4 0/4] GRE: Use flow based tunneling for OVS GRE vport.

2015-08-05 Thread Pravin B Shelar

Following patches make use of new flow based tunneling
API from kernel. This allows us to directly use netdev
based GRE tunnel implementation. While doing so I have
removed GRE demux API which were targeted for OVS. Most
of GRE protocol code is now consolidated in ip_gre module.

v3-v4:
Added interface to ip-gre device to enable meta data collection.
While doing this I split second patch into two patches.

v2-v3:
Add API to create GRE flow based device.
---

Pravin B Shelar (4):
  openvswitch: Move tunnel destroy function to oppenvswitch module.
  ip_gre: Add support to collect tunnel metadata.
  openvswitch: Use regular GRE net_device instead of vport
  gre: Remove support for sharing GRE protocol hook.

 include/net/gre.h  |  92 +
 include/net/ip_tunnels.h   |   7 +-
 include/uapi/linux/if_tunnel.h |   1 +
 net/ipv4/gre_demux.c   | 235 +-
 net/ipv4/ip_gre.c  | 446 ++---
 net/ipv4/ip_tunnel.c   |  37 +++-
 net/ipv4/ipip.c|   2 +-
 net/ipv6/sit.c |   2 +-
 net/openvswitch/Kconfig|   1 -
 net/openvswitch/vport-gre.c| 237 ++
 net/openvswitch/vport-netdev.c |  21 +-
 net/openvswitch/vport-netdev.h |   2 +-
 net/openvswitch/vport-vxlan.c  |  17 +-
 13 files changed, 502 insertions(+), 598 deletions(-)

-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] [trivial] net:wimax: Fix doucble word "the the" in networking.xml

2015-08-05 Thread Masanari Iida

This patch fix a double word "the the"
in Documentation/DocBook/networking.xml and
Documentation/DocBook/networking/API-Wimax-report-rfkill-sw.html.

These files are generated from comment in source, so I had to
fix the typo in net/wimax/io-rfkill.c

Signed-off-by: Masanari Iida 
---
 net/wimax/op-rfkill.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/net/wimax/op-rfkill.c b/net/wimax/op-rfkill.c
index 7d73054..477364a 100644
--- a/net/wimax/op-rfkill.c
+++ b/net/wimax/op-rfkill.c
@@ -135,8 +135,7 @@ EXPORT_SYMBOL_GPL(wimax_report_rfkill_hw);
  * @state: New state of the RF kill switch. %WIMAX_RF_ON radio on,
  * %WIMAX_RF_OFF radio off.
  *
- * Reports changes in the software RF switch state to the the WiMAX
- * stack.
+ * Reports changes in the software RF switch state to the WiMAX stack.
  *
  * The main use is during initialization, so the driver can query the
  * device for its current software radio kill switch state and feed it
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v6 3/4] bpf: Implement function bpf_perf_event_read() that get the selected hardware PMU conuter

2015-08-05 Thread xiakaixu

于 2015/8/5 21:53, Peter Zijlstra 写道:
> On Wed, Aug 05, 2015 at 12:04:25PM +0200, Peter Zijlstra wrote:
>> Also, you probably want a WARN_ON(in_nmi()) there, this function is
>> _NOT_ NMI safe.
> 
> I had a wee think about that, and I think the below is safe.
> 
> (with the obvious problem that WARN from NMI context is not safe)
> 
> It does not give you up-to-date overcommit times but your version didn't
> either so I'm assuming you don't need those, if you do need those it
> needs more but we can do that too.
> 
> ---
>  include/linux/perf_event.h |  1 +
>  kernel/events/core.c   | 53 
> ++
>  2 files changed, 54 insertions(+)
> 
> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> index 2027809433b3..64e821dd64f0 100644
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -659,6 +659,7 @@ perf_event_create_kernel_counter(struct perf_event_attr 
> *attr,
>   void *context);
>  extern void perf_pmu_migrate_context(struct pmu *pmu,
>   int src_cpu, int dst_cpu);
> +extern u64 perf_event_read_local(struct perf_event *event);
>  extern u64 perf_event_read_value(struct perf_event *event,
>u64 *enabled, u64 *running);
>  
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 39753bfd9520..7105d37763c1 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -3222,6 +3222,59 @@ static inline u64 perf_event_count(struct perf_event 
> *event)
>   return __perf_event_count(event);
>  }
>  
> +/*
> + * NMI-safe method to read a local event, that is an event that
> + * is:
> + *   - either for the current task, or for this CPU
> + *   - does not have inherit set, for inherited task events
> + * will not be local and we cannot read them atomically
> + *   - must not have a pmu::count method
> + */
> +u64 perf_event_read_local(struct perf_event *event)
> +{
> + unsigned long flags;
> + u64 val;
> +
> + /*
> +  * Disabling interrupts avoids all counter scheduling (context
> +  * switches, timer based rotation and IPIs).
> +  */
> + local_irq_safe(flags);

s/local_irq_safe/local_irq_save, and I have compiled and tested this function
and it is fine. Will use it in the next set.

Thanks.
> +
> + /* If this is a per-task event, it must be for current */
> + WARN_ON_ONCE((event->attach_state & PERF_ATTACH_TASK) &&
> +  event->hw.target != current);
> +
> + /* If this is a per-CPU event, it must be for this CPU */
> + WARN_ON_ONCE(!(event->attach_state & PERF_ATTACH_TASK) &&
> +  event->cpu != smp_processor_id());
> +
> + /*
> +  * It must not be an event with inherit set, we cannot read
> +  * all child counters from atomic context.
> +  */
> + WARN_ON_ONCE(event->attr.inherit);
> +
> + /*
> +  * It must not have a pmu::count method, those are not
> +  * NMI safe.
> +  */
> + WARN_ON_ONCE(event->pmu->count);
> +
> + /*
> +  * If the event is currently on this CPU, its either a per-task event,
> +  * or local to this CPU. Furthermore it means its ACTIVE (otherwise
> +  * oncpu == -1).
> +  */
> + if (event->oncpu == smp_processor_id())
> + event->pmu->read(event);
> +
> + val = local64_read(&event->count);
> + local_irq_restore(flags);
> +
> + return val;
> +}
> +
>  static u64 perf_event_read(struct perf_event *event)
>  {
>   /*
> 
> .
> 


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH v5 net-next 4/4] tcp: add NV congestion control

2015-08-05 Thread Kenneth Klette Jonassen

On Wed, Aug 5, 2015 at 3:39 AM, Lawrence Brakmo  wrote:
> This is a request for comments.

Nice to see more development on delay-based congestion control.

It would be good to see how NV stacks up against CDG. Any chance of
adding cdg as a congestion control parameter to your experiments?
Experiments on NV without its temporary cwnd reductions would also be
of interest -- to get a reference of how effective this mechanism is.


> +#define NV_INIT_RTT  0x

Maybe use U32_MAX?


> +static void tcpnv_init(struct sock *sk)
> +{
> +   struct tcpnv *ca = inet_csk_ca(sk);
> +
> +   tcpnv_reset(ca, sk);
> +
> +   ca->nv_min_rtt_reset_jiffies = jiffies + 2*HZ;
> +   ca->nv_min_rtt = NV_INIT_RTT;
> +   ca->nv_min_rtt_new = NV_INIT_RTT;
> +   ca->nv_enable = nv_enable;

Can this assignment be ca->nv_enable = 1? That would match the
TCP_CA_Open case in tcpnv_state().


> +   if (nv_dec_eval_min_calls > 255)
> +   nv_dec_eval_min_calls = 255;
> +   if (nv_rtt_min_cnt > 63)
> +   nv_rtt_min_cnt = 63;

nv_dec_eval_min_calls can be clamped to 0-255 by changing its type to u8.

nv_rtt_min_cnt can also be u8? In struct tcpnv, perhaps move
nv_rtt_cnt to the available byte.


> +static void tcpnv_cong_avoid(struct sock *sk, u32 ack, u32 acked)
> +{
> +   struct tcp_sock *tp = tcp_sk(sk);
> +   struct tcpnv *ca = inet_csk_ca(sk);
> +
> +   if (!tcp_is_cwnd_limited(sk))
> +   return;
> +
> +   /* Only grow cwnd if NV has not detected congestion */
> +   if (nv_enable && ca->nv_enable && !ca->nv_allow_cwnd_growth)
> +   return;

The check for ca->nv_enable might be overly harsh on some unfortunate
sockets in TCP_CA_Disorder. Is it needed here?


> +static void tcpnv_acked(struct sock *sk, struct ack_sample *sample)

Maybe move some of this function to tcpnv_cong_avoid()?


> +{
> +   const struct inet_connection_sock *icsk = inet_csk(sk);
> +   struct tcp_sock *tp = tcp_sk(sk);
> +   struct tcpnv *ca = inet_csk_ca(sk);
> +   unsigned long now = jiffies;
> +   s64 rate64 = 0;
> +   u32 rate, max_win, cwnd_by_slope;
> +   u32 avg_rtt;
> +   u32 bytes_acked = 0;
> +
> +   /* Some calls are for duplicates without timetamps */
> +   if (sample->rtt_us < 0)
> +   return;
> +
> +   /* If not in TCP_CA_Open state, skip. */
> +   if (icsk->icsk_ca_state != TCP_CA_Open)
> +   return;

Consider using samples in other states too, especially
TCP_CA_Disorder. Linux 4.2 enhances RTT sampling from SACKs, so any
non-negative RTT sample should be fully usable.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [net-next 07/15] i40e/i40evf: Add TX/RX outer UDP checksum support for X722

2015-08-05 Thread Jesse Brandeburg

On Wed, 5 Aug 2015 17:13:21 -0700
Tom Herbert  wrote:

> On Wed, Aug 5, 2015 at 4:52 PM, Jeff Kirsher
>  wrote:
> > From: Anjali Singhai Jain 
> > if (vsi->back->flags & I40E_FLAG_WB_ON_ITR_CAPABLE)
> > tx_ring->flags = I40E_TXR_FLAGS_WB_ON_ITR;
> > +   if (vsi->back->flags & I40E_FLAG_OUTER_UDP_CSUM_CAPABLE)
> > +   tx_ring->flags |= I40E_TXR_FLAGS_OUTER_UDP_CSUM;
> 
> Just curious... is there a difference between enabling the outer UDP
> checksum (of a tunnel) and just enabling checksum offload for UDP
> packets?

Yes, the hardware knows the difference (or we actually tell it
the difference) between a UDP packet and a tunnel inside a UDP
packet.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] sky2: Add module parameter for passing the MAC address

2015-08-05 Thread Florian Fainelli

On 05/08/15 16:16, Stephen Hemminger wrote:
> Something like this:
> 
> Subject: [PATCH net-next] sky2: use random address if EEPROM is bad
> 
> On some embedded systems the EEPROM does not contain a valid MAC address.
> In that case it is better to fallback to a generated mac address and
> let init scripts fix the value later.
> 
> Reported-by: Liviu Dudau 
> Signed-off-by: Stephen Hemminger 
> 
> 
> --- a/drivers/net/ethernet/marvell/sky2.c 2015-05-21 15:13:03.621126050 
> -0700
> +++ b/drivers/net/ethernet/marvell/sky2.c 2015-08-05 16:12:38.734534467 
> -0700
> @@ -4819,6 +4819,16 @@ static struct net_device *sky2_init_netd
>   memcpy_fromio(dev->dev_addr, hw->regs + B2_MAC_1 + port * 8,
> ETH_ALEN);
>  
> + /* if the address is invalid, use a random value */
> + if (!is_valid_ether_addr(dev->dev_addr)) {
> + struct sockaddr sa = { AF_UNSPEC };
> +
> + netdev_warn(dev,
> +  "Invalid MAC address defaulting to random\n");
> + sky2_set_mac_address(dev, &sa);
> + dev->addr_assign_type |= NET_ADDR_RANDOM;

There is a helper for that: eth_hw_addr_random() which sets the
addr_assign_type for you and copies the address to dev->dev_addr.
-- 
Florian
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: rtnl_mutex deadlock?

2015-08-05 Thread Herbert Xu

On Wed, Aug 05, 2015 at 08:59:07PM +0200, Daniel Borkmann wrote:
>
> Here's a theory and patch below. Herbert, Thomas, does this make any
> sense to you resp. sound plausible? ;)

It's certainly possible.  Whether it's plausible I'm not so sure.
The netlink hashtable is unlimited in size.  So it should always
be expanding, not rehashing.  The bug you found should only affect
rehashing.

> I'm not quite sure what's best to return from here, i.e. whether we
> propagate -ENOMEM or instead retry over and over again hoping that the
> rehashing completed (and no new rehashing started in the mean time) ...

Please use something other than ENOMEM as it is already heavily
used in this context.  Perhaps EOVERFLOW?

We should probably add a WARN_ON_ONCE in rhashtable_insert_rehash
since two concurrent rehashings indicates something is going
seriously wrong.

Thanks,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] sky2: Add module parameter for passing the MAC address

2015-08-05 Thread David Miller

From: Liviu Dudau 
Date: Wed,  5 Aug 2015 16:50:54 +0100

> For designs where EEPROMs are not connected to PCI Yukon2
> chips we need to get the MAC address from the firmware.
> Add a module parameter called 'mac_address' for this. It
> will be used if no DT node can be found and the B2_MAC
> register holds an invalid value.
> 
> Signed-off-by: Liviu Dudau 

Sorry, such module options are absolutely not allowed.

If an invalid MAC is present, it should be set to a random
one via eth_random_addr().
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [net-next 07/15] i40e/i40evf: Add TX/RX outer UDP checksum support for X722

2015-08-05 Thread Tom Herbert

On Wed, Aug 5, 2015 at 4:52 PM, Jeff Kirsher
 wrote:
> From: Anjali Singhai Jain 
>
> X722 supports offloading of outer UDP TX and RX checksum for tunneled
> packets. This patch exposes the support and leaves it enabled by
> default.
>
> Signed-off-by: Anjali Singhai Jain 
> Signed-off-by: Catherine Sullivan 
> Tested-by: Jim Young 
> Signed-off-by: Jeff Kirsher 
> ---
>  drivers/net/ethernet/intel/i40e/i40e_main.c   |  2 ++
>  drivers/net/ethernet/intel/i40e/i40e_txrx.c   | 16 +++-
>  drivers/net/ethernet/intel/i40e/i40e_txrx.h   |  2 ++
>  drivers/net/ethernet/intel/i40e/i40e_type.h   | 10 --
>  drivers/net/ethernet/intel/i40evf/i40e_txrx.c | 13 +
>  drivers/net/ethernet/intel/i40evf/i40e_txrx.h |  2 ++
>  drivers/net/ethernet/intel/i40evf/i40e_type.h | 10 --
>  7 files changed, 50 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
> b/drivers/net/ethernet/intel/i40e/i40e_main.c
> index 28f547c..d9cb87f 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_main.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
> @@ -7073,6 +7073,8 @@ static int i40e_alloc_rings(struct i40e_vsi *vsi)
> tx_ring->dcb_tc = 0;
> if (vsi->back->flags & I40E_FLAG_WB_ON_ITR_CAPABLE)
> tx_ring->flags = I40E_TXR_FLAGS_WB_ON_ITR;
> +   if (vsi->back->flags & I40E_FLAG_OUTER_UDP_CSUM_CAPABLE)
> +   tx_ring->flags |= I40E_TXR_FLAGS_OUTER_UDP_CSUM;

Just curious... is there a difference between enabling the outer UDP
checksum (of a tunnel) and just enabling checksum offload for UDP
packets?

Tom

> vsi->tx_rings[i] = tx_ring;
>
> rx_ring = &tx_ring[1];
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c 
> b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> index 7d0a5ea..57dc5d2 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> @@ -1429,7 +1429,8 @@ static inline void i40e_rx_checksum(struct i40e_vsi 
> *vsi,
>  * so the total length of IPv4 header is IHL*4 bytes
>  * The UDP_0 bit *may* bet set if the *inner* header is UDP
>  */
> -   if (ipv4_tunnel) {
> +   if (!(vsi->back->flags & I40E_FLAG_OUTER_UDP_CSUM_CAPABLE) &&
> +   (ipv4_tunnel)) {
> skb->transport_header = skb->mac_header +
> sizeof(struct ethhdr) +
> (ip_hdr(skb)->ihl * 4);
> @@ -2301,11 +2302,15 @@ static void i40e_tx_enable_csum(struct sk_buff *skb, 
> u32 *tx_flags,
> struct iphdr *this_ip_hdr;
> u32 network_hdr_len;
> u8 l4_hdr = 0;
> +   struct udphdr *oudph;
> +   struct iphdr *oiph;
> u32 l4_tunnel = 0;
>
> if (skb->encapsulation) {
> switch (ip_hdr(skb)->protocol) {
> case IPPROTO_UDP:
> +   oudph = udp_hdr(skb);
> +   oiph = ip_hdr(skb);
> l4_tunnel = I40E_TXD_CTX_UDP_TUNNELING;
> *tx_flags |= I40E_TX_FLAGS_VXLAN_TUNNEL;
> break;
> @@ -2342,6 +2347,15 @@ static void i40e_tx_enable_csum(struct sk_buff *skb, 
> u32 *tx_flags,
> *tx_flags &= ~I40E_TX_FLAGS_IPV4;
> *tx_flags |= I40E_TX_FLAGS_IPV6;
> }
> +   if ((tx_ring->flags & I40E_TXR_FLAGS_OUTER_UDP_CSUM) &&
> +   (l4_tunnel == I40E_TXD_CTX_UDP_TUNNELING)&&
> +   (*cd_tunneling & I40E_TXD_CTX_QW0_EXT_IP_MASK)) {
> +   oudph->check = ~csum_tcpudp_magic(oiph->saddr,
> +   oiph->daddr,
> +   (skb->len - 
> skb_transport_offset(skb)),
> +   IPPROTO_UDP, 0);
> +   *cd_tunneling |= I40E_TXD_CTX_QW0_L4T_CS_MASK;
> +   }
> } else {
> network_hdr_len = skb_network_header_len(skb);
> this_ip_hdr = ip_hdr(skb);
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.h 
> b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
> index 0e40994..f1385a1 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.h
> +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
> @@ -267,6 +267,8 @@ struct i40e_ring {
>
> u16 flags;
>  #define I40E_TXR_FLAGS_WB_ON_ITR   BIT(0)
> +#define I40E_TXR_FLAGS_OUTER_UDP_CSUM  BIT(1)
> +
> /* stats structs */
> struct i40e_queue_stats stats;
> struct u64_stats_sync syncp;
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_type.h 
> b/drivers/net/ethernet/intel/i40e/i40e_type.h
> index 1ffd271..b93357d 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_type.h
> +++ b/drivers/net/ethernet/intel/i40e/i40e_type.h
> @@ -607,14 +607,18 @@ enum i40e_rx_desc_status_bits {
> I

[net-next 11/15] e1000e: Fix EEE in Sx implementation

2015-08-05 Thread Jeff Kirsher

From: Raanan Avargil 

This patch implements the EEE in Sx code so that it only applies to parts
that support EEE in Sx (as opposed to all parts that support EEE).
It also uses the existing eee_advert and eee_lp_abiliity to set just the
bits (100/1000) that should be set.

Signed-off-by: Raanan Avargil 
Tested-by: Aaron Brown 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/e1000e/netdev.c | 27 +++
 1 file changed, 27 insertions(+)

diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c 
b/drivers/net/ethernet/intel/e1000e/netdev.c
index fea1601..b32bc48 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -6317,6 +6317,33 @@ static int __e1000_shutdown(struct pci_dev *pdev, bool 
runtime)
return retval;
}
 
+   /* Ensure that the appropriate bits are set in LPI_CTRL
+* for EEE in Sx
+*/
+   if ((hw->phy.type >= e1000_phy_i217) &&
+   adapter->eee_advert && hw->dev_spec.ich8lan.eee_lp_ability) {
+   u16 lpi_ctrl = 0;
+
+   retval = hw->phy.ops.acquire(hw);
+   if (!retval) {
+   retval = e1e_rphy_locked(hw, I82579_LPI_CTRL,
+&lpi_ctrl);
+   if (!retval) {
+   if (adapter->eee_advert &
+   hw->dev_spec.ich8lan.eee_lp_ability &
+   I82579_EEE_100_SUPPORTED)
+   lpi_ctrl |= I82579_LPI_CTRL_100_ENABLE;
+   if (adapter->eee_advert &
+   hw->dev_spec.ich8lan.eee_lp_ability &
+   I82579_EEE_1000_SUPPORTED)
+   lpi_ctrl |= I82579_LPI_CTRL_1000_ENABLE;
+
+   retval = e1e_wphy_locked(hw, I82579_LPI_CTRL,
+lpi_ctrl);
+   }
+   }
+   hw->phy.ops.release(hw);
+   }
 
/* Release control of h/w to f/w.  If f/w is AMT enabled, this
 * would have already happened in close and is redundant.
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[net-next 03/15] i40e/i40evf: Update FW API with X722 support

2015-08-05 Thread Jeff Kirsher

From: Anjali Singhai Jain 

This patch does the firmware API update to support the new X722 device.

Signed-off-by: Anjali Singhai Jain 
Signed-off-by: Catherine Sullivan 
Tested-by: Jim Young 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h  |  48 ++
 drivers/net/ethernet/intel/i40e/i40e_common.c  | 163 +
 drivers/net/ethernet/intel/i40e/i40e_prototype.h   |  11 ++
 .../net/ethernet/intel/i40evf/i40e_adminq_cmd.h|  49 ++-
 drivers/net/ethernet/intel/i40evf/i40e_common.c| 163 +
 drivers/net/ethernet/intel/i40evf/i40e_prototype.h |  11 ++
 6 files changed, 444 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h 
b/drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h
index 9101f5c..95d23bf 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h
@@ -257,6 +257,10 @@ enum i40e_admin_queue_opc {
/* Tunnel commands */
i40e_aqc_opc_add_udp_tunnel = 0x0B00,
i40e_aqc_opc_del_udp_tunnel = 0x0B01,
+   i40e_aqc_opc_set_rss_key= 0x0B02,
+   i40e_aqc_opc_set_rss_lut= 0x0B03,
+   i40e_aqc_opc_get_rss_key= 0x0B04,
+   i40e_aqc_opc_get_rss_lut= 0x0B05,
 
/* Async Events */
i40e_aqc_opc_event_lan_overflow = 0x1001,
@@ -821,8 +825,12 @@ struct i40e_aqc_vsi_properties_data {
 I40E_AQ_VSI_TC_QUE_NUMBER_SHIFT)
/* queueing option section */
u8  queueing_opt_flags;
+#define I40E_AQ_VSI_QUE_OPT_MULTICAST_UDP_ENA  0x04
+#define I40E_AQ_VSI_QUE_OPT_UNICAST_UDP_ENA0x08
 #define I40E_AQ_VSI_QUE_OPT_TCP_ENA0x10
 #define I40E_AQ_VSI_QUE_OPT_FCOE_ENA   0x20
+#define I40E_AQ_VSI_QUE_OPT_RSS_LUT_PF 0x00
+#define I40E_AQ_VSI_QUE_OPT_RSS_LUT_VSI0x40
u8  queueing_opt_reserved[3];
/* scheduler section */
u8  up_enable_bits;
@@ -2179,6 +2187,46 @@ struct i40e_aqc_del_udp_tunnel_completion {
 
 I40E_CHECK_CMD_LENGTH(i40e_aqc_del_udp_tunnel_completion);
 
+struct i40e_aqc_get_set_rss_key {
+#define I40E_AQC_SET_RSS_KEY_VSI_VALID (0x1 << 15)
+#define I40E_AQC_SET_RSS_KEY_VSI_ID_SHIFT  0
+#define I40E_AQC_SET_RSS_KEY_VSI_ID_MASK   (0x3FF << \
+   I40E_AQC_SET_RSS_KEY_VSI_ID_SHIFT)
+   __le16  vsi_id;
+   u8  reserved[6];
+   __le32  addr_high;
+   __le32  addr_low;
+};
+
+I40E_CHECK_CMD_LENGTH(i40e_aqc_get_set_rss_key);
+
+struct i40e_aqc_get_set_rss_key_data {
+   u8 standard_rss_key[0x28];
+   u8 extended_hash_key[0xc];
+};
+
+I40E_CHECK_STRUCT_LEN(0x34, i40e_aqc_get_set_rss_key_data);
+
+struct  i40e_aqc_get_set_rss_lut {
+#define I40E_AQC_SET_RSS_LUT_VSI_VALID (0x1 << 15)
+#define I40E_AQC_SET_RSS_LUT_VSI_ID_SHIFT  0
+#define I40E_AQC_SET_RSS_LUT_VSI_ID_MASK   (0x3FF << \
+   I40E_AQC_SET_RSS_LUT_VSI_ID_SHIFT)
+   __le16  vsi_id;
+#define I40E_AQC_SET_RSS_LUT_TABLE_TYPE_SHIFT  0
+#define I40E_AQC_SET_RSS_LUT_TABLE_TYPE_MASK   (0x1 << \
+   I40E_AQC_SET_RSS_LUT_TABLE_TYPE_SHIFT)
+
+#define I40E_AQC_SET_RSS_LUT_TABLE_TYPE_VSI0
+#define I40E_AQC_SET_RSS_LUT_TABLE_TYPE_PF 1
+   __le16  flags;
+   u8  reserved[4];
+   __le32  addr_high;
+   __le32  addr_low;
+};
+
+I40E_CHECK_CMD_LENGTH(i40e_aqc_get_set_rss_lut);
+
 /* tunnel key structure 0x0B10 */
 
 struct i40e_aqc_tunnel_key_structure {
diff --git a/drivers/net/ethernet/intel/i40e/i40e_common.c 
b/drivers/net/ethernet/intel/i40e/i40e_common.c
index 11ec264..114dc64 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_common.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_common.c
@@ -392,6 +392,169 @@ i40e_status i40e_aq_queue_shutdown(struct i40e_hw *hw,
return status;
 }
 
+/**
+ * i40e_aq_get_set_rss_lut
+ * @hw: pointer to the hardware structure
+ * @vsi_id: vsi fw index
+ * @pf_lut: for PF table set true, for VSI table set false
+ * @lut: pointer to the lut buffer provided by the caller
+ * @lut_size: size of the lut buffer
+ * @set: set true to set the table, false to get the table
+ *
+ * Internal function to get or set RSS look up table
+ **/
+static i40e_status i40e_aq_get_set_rss_lut(struct i40e_hw *hw,
+  u16 vsi_id, bool pf_lut,
+  u8 *lut, u16 lut_size,
+  bool set)
+{
+   i40e_status status;
+   struct i40e_aq_desc desc;
+   struct i40e_aqc_get_set_rss_lut *cmd_resp =
+  (struct i40e_aqc_get_set_rss_lut *)&desc.params.raw;
+
+   if (set)
+   i40e_fill_default_direct_cmd_desc(&desc,
+ i40e_aqc_opc_set_rss_lut);
+   else
+   i40e_fill_default_direct_cmd_desc(&d

[net-next 09/15] i40e/i40evf: Add ATR HW eviction support for X722

2015-08-05 Thread Jeff Kirsher

From: Anjali Singhai Jain 

X722 supports evicting ATR filters in the HW. With this patch, we enable
the feature in the driver and avoid filter deletion by the driver.

Signed-off-by: Anjali Singhai Jain 
Signed-off-by: Catherine Sullivan 
Tested-by: Jim Young 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_txrx.c | 10 ++
 drivers/net/ethernet/intel/i40e/i40e_type.h |  4 
 2 files changed, 14 insertions(+)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c 
b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index 57dc5d2..738aca6 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -2040,6 +2040,13 @@ static void i40e_atr(struct i40e_ring *tx_ring, struct 
sk_buff *skb,
/* Due to lack of space, no more new filters can be programmed */
if (th->syn && (pf->auto_disable_flags & I40E_FLAG_FD_ATR_ENABLED))
return;
+   if (pf->flags & I40E_FLAG_HW_ATR_EVICT_CAPABLE) {
+   /* HW ATR eviction will take care of removing filters on FIN
+* and RST packets.
+*/
+   if (th->fin || th->rst)
+   return;
+   }
 
tx_ring->atr_count++;
 
@@ -2095,6 +2102,9 @@ static void i40e_atr(struct i40e_ring *tx_ring, struct 
sk_buff *skb,
I40E_TXD_FLTR_QW1_CNTINDEX_SHIFT) &
I40E_TXD_FLTR_QW1_CNTINDEX_MASK;
 
+   if (pf->flags & I40E_FLAG_HW_ATR_EVICT_CAPABLE)
+   dtype_cmd |= I40E_TXD_FLTR_QW1_ATR_MASK;
+
fdir_desc->qindex_flex_ptype_vsi = cpu_to_le32(flex_ptype);
fdir_desc->rsvd = cpu_to_le32(0);
fdir_desc->dtype_cmd_cntindex = cpu_to_le32(dtype_cmd);
diff --git a/drivers/net/ethernet/intel/i40e/i40e_type.h 
b/drivers/net/ethernet/intel/i40e/i40e_type.h
index b93357d..61b6b11 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_type.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_type.h
@@ -1049,6 +1049,10 @@ enum i40e_filter_program_desc_pcmd {
 #define I40E_TXD_FLTR_QW1_FD_STATUS_MASK (0x3ULL << \
  I40E_TXD_FLTR_QW1_FD_STATUS_SHIFT)
 
+#define I40E_TXD_FLTR_QW1_ATR_SHIFT(0xEULL + \
+I40E_TXD_FLTR_QW1_CMD_SHIFT)
+#define I40E_TXD_FLTR_QW1_ATR_MASK BIT_ULL(I40E_TXD_FLTR_QW1_ATR_SHIFT)
+
 #define I40E_TXD_FLTR_QW1_CNTINDEX_SHIFT 20
 #define I40E_TXD_FLTR_QW1_CNTINDEX_MASK(0x1FFUL << \
 I40E_TXD_FLTR_QW1_CNTINDEX_SHIFT)
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[net-next 05/15] i40e/i40evf: RSS changes for X722

2015-08-05 Thread Jeff Kirsher

From: Anjali Singhai Jain 

X722 uses the admin queue to configure RSS. This patch adds the necessary
flow changes to configure RSS through AQ. It also adds the separate VMDQ2
lookup tables and hash key programming for X722.

X722 also exposes a different set of PCTYPES for RSS, this patch
accommodates those changes.

Signed-off-by: Anjali Singhai Jain 
Signed-off-by: Catherine Sullivan 
Signed-off-by: Mitch Williams 
Tested-by: Jim Young 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e.h |   7 +-
 drivers/net/ethernet/intel/i40e/i40e_main.c| 156 ++-
 drivers/net/ethernet/intel/i40e/i40e_txrx.h|  12 ++
 drivers/net/ethernet/intel/i40e/i40e_type.h|  15 +-
 drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c |  11 +-
 drivers/net/ethernet/intel/i40evf/i40e_txrx.h  |  12 ++
 drivers/net/ethernet/intel/i40evf/i40e_type.h  |  15 +-
 drivers/net/ethernet/intel/i40evf/i40evf.h |   2 +
 drivers/net/ethernet/intel/i40evf/i40evf_main.c| 166 -
 9 files changed, 307 insertions(+), 89 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e.h 
b/drivers/net/ethernet/intel/i40e/i40e.h
index 9914886..66d0780 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -79,10 +79,13 @@
 #define I40E_MIN_MSIX 2
 #define I40E_DEFAULT_NUM_VMDQ_VSI 8 /* max 256 VSIs */
 #define I40E_MIN_VSI_ALLOC51 /* LAN, ATR, FCOE, 32 VF, 16 VMDQ */
-#define I40E_DEFAULT_QUEUES_PER_VMDQ  2 /* max 16 qps */
+/* max 16 qps */
+#define i40e_default_queues_per_vmdq(pf) \
+   (((pf)->flags & I40E_FLAG_RSS_AQ_CAPABLE) ? 4 : 1)
 #define I40E_DEFAULT_QUEUES_PER_VF4
 #define I40E_DEFAULT_QUEUES_PER_TC1 /* should be a power of 2 */
-#define I40E_MAX_QUEUES_PER_TC64 /* should be a power of 2 */
+#define i40e_pf_get_max_q_per_tc(pf) \
+   (((pf)->flags & I40E_FLAG_128_QP_RSS_CAPABLE) ? 128 : 64)
 #define I40E_FDIR_RING0
 #define I40E_FDIR_RING_COUNT  32
 #ifdef I40E_FCOE
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 3269b05..2e84165 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -1550,7 +1550,7 @@ static void i40e_vsi_setup_queue_map(struct i40e_vsi *vsi,
 */
qcount = min_t(int, vsi->alloc_queue_pairs, pf->num_lan_msix);
num_tc_qps = qcount / numtc;
-   num_tc_qps = min_t(int, num_tc_qps, I40E_MAX_QUEUES_PER_TC);
+   num_tc_qps = min_t(int, num_tc_qps, i40e_pf_get_max_q_per_tc(pf));
 
/* Setup queue offset/count for all TCs for given VSI */
for (i = 0; i < I40E_MAX_TRAFFIC_CLASS; i++) {
@@ -7469,62 +7469,139 @@ static int i40e_setup_misc_vector(struct i40e_pf *pf)
 }
 
 /**
- * i40e_config_rss - Prepare for RSS if used
+ * i40e_config_rss_aq - Prepare for RSS using AQ commands
+ * @vsi: vsi structure
+ * @seed: RSS hash seed
+ **/
+static int i40e_config_rss_aq(struct i40e_vsi *vsi, const u8 *seed)
+{
+   struct i40e_aqc_get_set_rss_key_data rss_key;
+   struct i40e_pf *pf = vsi->back;
+   struct i40e_hw *hw = &pf->hw;
+   bool pf_lut = false;
+   u8 *rss_lut;
+   int ret, i;
+
+   memset(&rss_key, 0, sizeof(rss_key));
+   memcpy(&rss_key, seed, sizeof(rss_key));
+
+   rss_lut = kzalloc(pf->rss_table_size, GFP_KERNEL);
+   if (!rss_lut)
+   return -ENOMEM;
+
+   /* Populate the LUT with max no. of queues in round robin fashion */
+   for (i = 0; i < vsi->rss_table_size; i++)
+   rss_lut[i] = i % vsi->rss_size;
+
+   ret = i40e_aq_set_rss_key(hw, vsi->id, &rss_key);
+   if (ret) {
+   dev_info(&pf->pdev->dev,
+"Cannot set RSS key, err %s aq_err %s\n",
+i40e_stat_str(&pf->hw, ret),
+i40e_aq_str(&pf->hw, pf->hw.aq.asq_last_status));
+   return ret;
+   }
+
+   if (vsi->type == I40E_VSI_MAIN)
+   pf_lut = true;
+
+   ret = i40e_aq_set_rss_lut(hw, vsi->id, pf_lut, rss_lut,
+ vsi->rss_table_size);
+   if (ret)
+   dev_info(&pf->pdev->dev,
+"Cannot set RSS lut, err %s aq_err %s\n",
+i40e_stat_str(&pf->hw, ret),
+i40e_aq_str(&pf->hw, pf->hw.aq.asq_last_status));
+
+   return ret;
+}
+
+/**
+ * i40e_vsi_config_rss - Prepare for VSI(VMDq) RSS if used
+ * @vsi: VSI structure
+ **/
+static int i40e_vsi_config_rss(struct i40e_vsi *vsi)
+{
+   u8 seed[I40E_HKEY_ARRAY_SIZE];
+   struct i40e_pf *pf = vsi->back;
+
+   netdev_rss_key_fill((void *)seed, I40E_HKEY_ARRAY_SIZE);
+   vsi->rss_size = min_t(int, pf->rss_size, vsi->num_queue_pairs);
+
+   if (pf->flags & I40E_FLAG_RSS_AQ_CAPABLE)
+   retur

[net-next 07/15] i40e/i40evf: Add TX/RX outer UDP checksum support for X722

2015-08-05 Thread Jeff Kirsher

From: Anjali Singhai Jain 

X722 supports offloading of outer UDP TX and RX checksum for tunneled
packets. This patch exposes the support and leaves it enabled by
default.

Signed-off-by: Anjali Singhai Jain 
Signed-off-by: Catherine Sullivan 
Tested-by: Jim Young 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_main.c   |  2 ++
 drivers/net/ethernet/intel/i40e/i40e_txrx.c   | 16 +++-
 drivers/net/ethernet/intel/i40e/i40e_txrx.h   |  2 ++
 drivers/net/ethernet/intel/i40e/i40e_type.h   | 10 --
 drivers/net/ethernet/intel/i40evf/i40e_txrx.c | 13 +
 drivers/net/ethernet/intel/i40evf/i40e_txrx.h |  2 ++
 drivers/net/ethernet/intel/i40evf/i40e_type.h | 10 --
 7 files changed, 50 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 28f547c..d9cb87f 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -7073,6 +7073,8 @@ static int i40e_alloc_rings(struct i40e_vsi *vsi)
tx_ring->dcb_tc = 0;
if (vsi->back->flags & I40E_FLAG_WB_ON_ITR_CAPABLE)
tx_ring->flags = I40E_TXR_FLAGS_WB_ON_ITR;
+   if (vsi->back->flags & I40E_FLAG_OUTER_UDP_CSUM_CAPABLE)
+   tx_ring->flags |= I40E_TXR_FLAGS_OUTER_UDP_CSUM;
vsi->tx_rings[i] = tx_ring;
 
rx_ring = &tx_ring[1];
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c 
b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index 7d0a5ea..57dc5d2 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -1429,7 +1429,8 @@ static inline void i40e_rx_checksum(struct i40e_vsi *vsi,
 * so the total length of IPv4 header is IHL*4 bytes
 * The UDP_0 bit *may* bet set if the *inner* header is UDP
 */
-   if (ipv4_tunnel) {
+   if (!(vsi->back->flags & I40E_FLAG_OUTER_UDP_CSUM_CAPABLE) &&
+   (ipv4_tunnel)) {
skb->transport_header = skb->mac_header +
sizeof(struct ethhdr) +
(ip_hdr(skb)->ihl * 4);
@@ -2301,11 +2302,15 @@ static void i40e_tx_enable_csum(struct sk_buff *skb, 
u32 *tx_flags,
struct iphdr *this_ip_hdr;
u32 network_hdr_len;
u8 l4_hdr = 0;
+   struct udphdr *oudph;
+   struct iphdr *oiph;
u32 l4_tunnel = 0;
 
if (skb->encapsulation) {
switch (ip_hdr(skb)->protocol) {
case IPPROTO_UDP:
+   oudph = udp_hdr(skb);
+   oiph = ip_hdr(skb);
l4_tunnel = I40E_TXD_CTX_UDP_TUNNELING;
*tx_flags |= I40E_TX_FLAGS_VXLAN_TUNNEL;
break;
@@ -2342,6 +2347,15 @@ static void i40e_tx_enable_csum(struct sk_buff *skb, u32 
*tx_flags,
*tx_flags &= ~I40E_TX_FLAGS_IPV4;
*tx_flags |= I40E_TX_FLAGS_IPV6;
}
+   if ((tx_ring->flags & I40E_TXR_FLAGS_OUTER_UDP_CSUM) &&
+   (l4_tunnel == I40E_TXD_CTX_UDP_TUNNELING)&&
+   (*cd_tunneling & I40E_TXD_CTX_QW0_EXT_IP_MASK)) {
+   oudph->check = ~csum_tcpudp_magic(oiph->saddr,
+   oiph->daddr,
+   (skb->len - skb_transport_offset(skb)),
+   IPPROTO_UDP, 0);
+   *cd_tunneling |= I40E_TXD_CTX_QW0_L4T_CS_MASK;
+   }
} else {
network_hdr_len = skb_network_header_len(skb);
this_ip_hdr = ip_hdr(skb);
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.h 
b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
index 0e40994..f1385a1 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
@@ -267,6 +267,8 @@ struct i40e_ring {
 
u16 flags;
 #define I40E_TXR_FLAGS_WB_ON_ITR   BIT(0)
+#define I40E_TXR_FLAGS_OUTER_UDP_CSUM  BIT(1)
+
/* stats structs */
struct i40e_queue_stats stats;
struct u64_stats_sync syncp;
diff --git a/drivers/net/ethernet/intel/i40e/i40e_type.h 
b/drivers/net/ethernet/intel/i40e/i40e_type.h
index 1ffd271..b93357d 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_type.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_type.h
@@ -607,14 +607,18 @@ enum i40e_rx_desc_status_bits {
I40E_RX_DESC_STATUS_CRCP_SHIFT  = 4,
I40E_RX_DESC_STATUS_TSYNINDX_SHIFT  = 5, /* 2 BITS */
I40E_RX_DESC_STATUS_TSYNVALID_SHIFT = 7,
-   I40E_RX_DESC_STATUS_PIF_SHIFT   = 8,
+   /* Note: Bit 8 is reserved in X710 and XL710 */
+   I40E_RX_DESC_STATUS_EXT_UDP_0_SHIFT = 8,
I40E_RX_DESC_STATUS_UMBCAST_SHIFT   = 9, /* 2 BITS */
I40E_RX_DE

[net-next 14/15] e1000e: Fix tight loop implementation of systime read algorithm

2015-08-05 Thread Jeff Kirsher

From: Raanan Avargil 

Change the algorithm. Read systimel twice and check for overflow.
If there was no overflow, use the first value.
If there was an overflow, read systimeh again and use the second
systimel value.

Signed-off-by: Raanan Avargil 
Tested-by: Aaron Brown 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/e1000e/netdev.c | 31 --
 1 file changed, 21 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c 
b/drivers/net/ethernet/intel/e1000e/netdev.c
index 24b7269..96a8166 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -4280,18 +4280,29 @@ static cycle_t e1000e_cyclecounter_read(const struct 
cyclecounter *cc)
struct e1000_adapter *adapter = container_of(cc, struct e1000_adapter,
 cc);
struct e1000_hw *hw = &adapter->hw;
+   u32 systimel_1, systimel_2, systimeh;
cycle_t systim, systim_next;
-   /* SYSTIMH latching upon SYSTIML read does not work well. To fix that
-* we don't want to allow overflow of SYSTIML and a change to SYSTIMH
-* to occur between reads, so if we read a vale close to overflow, we
-* wait for overflow to occur and read both registers when its safe.
+   /* SYSTIMH latching upon SYSTIML read does not work well.
+* This means that if SYSTIML overflows after we read it but before
+* we read SYSTIMH, the value of SYSTIMH has been incremented and we
+* will experience a huge non linear increment in the systime value
+* to fix that we test for overflow and if true, we re-read systime.
 */
-   u32 systim_overflow_latch_fix = 0x3FFF;
-
-   do {
-   systim = (cycle_t)er32(SYSTIML);
-   } while (systim > systim_overflow_latch_fix);
-   systim |= (cycle_t)er32(SYSTIMH) << 32;
+   systimel_1 = er32(SYSTIML);
+   systimeh = er32(SYSTIMH);
+   systimel_2 = er32(SYSTIML);
+   /* Check for overflow. If there was no overflow, use the values */
+   if (systimel_1 < systimel_2) {
+   systim = (cycle_t)systimel_1;
+   systim |= (cycle_t)systimeh << 32;
+   } else {
+   /* There was an overflow, read again SYSTIMH, and use
+* systimel_2
+*/
+   systimeh = er32(SYSTIMH);
+   systim = (cycle_t)systimel_2;
+   systim |= (cycle_t)systimeh << 32;
+   }
 
if ((hw->mac.type == e1000_82574) || (hw->mac.type == e1000_82583)) {
u64 incvalue, time_delta, rem, temp;
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[net-next 08/15] i40e: Add IWARP support for X722

2015-08-05 Thread Jeff Kirsher

From: Anjali Singhai Jain 

X722 supports IWARP, this patch handles checking for PE critical errors.
Since the driver doesn't support the IWARP interface for now, this patch
just does bare minimum to log a message oif a PE critical error
happens.

Signed-off-by: Anjali Singhai Jain 
Signed-off-by: Catherine Sullivan 
Tested-by: Jim Young 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
b/drivers/net/ethernet/intel/i40e/i40e_main.c
index d9cb87f..3bb832a 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -2908,6 +2908,9 @@ static void i40e_enable_misc_int_causes(struct i40e_pf 
*pf)
  I40E_PFINT_ICR0_ENA_VFLR_MASK  |
  I40E_PFINT_ICR0_ENA_ADMINQ_MASK;
 
+   if (pf->flags & I40E_FLAG_IWARP_ENABLED)
+   val |= I40E_PFINT_ICR0_ENA_PE_CRITERR_MASK;
+
if (pf->flags & I40E_FLAG_PTP)
val |= I40E_PFINT_ICR0_ENA_TIMESYNC_MASK;
 
@@ -3198,6 +3201,13 @@ static irqreturn_t i40e_intr(int irq, void *data)
(icr0 & I40E_PFINT_ICR0_SWINT_MASK))
pf->sw_int_count++;
 
+   if ((pf->flags & I40E_FLAG_IWARP_ENABLED) &&
+   (ena_mask & I40E_PFINT_ICR0_ENA_PE_CRITERR_MASK)) {
+   ena_mask &= ~I40E_PFINT_ICR0_ENA_PE_CRITERR_MASK;
+   icr0 &= ~I40E_PFINT_ICR0_ENA_PE_CRITERR_MASK;
+   dev_info(&pf->pdev->dev, "cleared PE_CRITERR\n");
+   }
+
/* only q0 is used in MSI/Legacy mode, and none are used in MSIX */
if (icr0 & I40E_PFINT_ICR0_QUEUE_0_MASK) {
 
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[net-next 15/15] e1000e: Increase driver version number

2015-08-05 Thread Jeff Kirsher

From: Raanan Avargil 

Signed-off-by: Raanan Avargil 
Tested-by: Aaron Brown 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/e1000e/netdev.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c 
b/drivers/net/ethernet/intel/e1000e/netdev.c
index 96a8166..546b5da 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -48,7 +48,7 @@
 
 #define DRV_EXTRAVERSION "-k"
 
-#define DRV_VERSION "3.2.5" DRV_EXTRAVERSION
+#define DRV_VERSION "3.2.6" DRV_EXTRAVERSION
 char e1000e_driver_name[] = "e1000e";
 const char e1000e_driver_version[] = DRV_VERSION;
 
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[net-next 13/15] e1000e: Fix incorrect ASPM locking

2015-08-05 Thread Jeff Kirsher

From: Raanan Avargil 

This patch fixes wrong locking usage.
In the context of slot reset, we should use lock.
And during resume, there is no need of lock.

Reported-by: Bjorn Helgaas 
Signed-off-by: Raanan Avargil 
Tested-by: Aaron Brown 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/e1000e/netdev.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c 
b/drivers/net/ethernet/intel/e1000e/netdev.c
index b32bc48..24b7269 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -6493,7 +6493,7 @@ static int __e1000_resume(struct pci_dev *pdev)
if (adapter->flags2 & FLAG2_DISABLE_ASPM_L1)
aspm_disable_flag |= PCIE_LINK_STATE_L1;
if (aspm_disable_flag)
-   e1000e_disable_aspm_locked(pdev, aspm_disable_flag);
+   e1000e_disable_aspm(pdev, aspm_disable_flag);
 
pci_set_master(pdev);
 
@@ -6771,7 +6771,7 @@ static pci_ers_result_t e1000_io_slot_reset(struct 
pci_dev *pdev)
if (adapter->flags2 & FLAG2_DISABLE_ASPM_L1)
aspm_disable_flag |= PCIE_LINK_STATE_L1;
if (aspm_disable_flag)
-   e1000e_disable_aspm(pdev, aspm_disable_flag);
+   e1000e_disable_aspm_locked(pdev, aspm_disable_flag);
 
err = pci_enable_device_mem(pdev);
if (err) {
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[net-next 02/15] i40e/i40evf: Add flags for X722 capabilities

2015-08-05 Thread Jeff Kirsher

From: Anjali Singhai Jain 

Add capabilities flags specific to X722.

Signed-off-by: Anjali Singhai Jain 
Signed-off-by: Catherine Sullivan 
Tested-by: Jim Young 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e.h  | 7 +++
 drivers/net/ethernet/intel/i40e/i40e_main.c | 8 
 drivers/net/ethernet/intel/i40evf/i40evf.h  | 7 ++-
 3 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e.h 
b/drivers/net/ethernet/intel/i40e/i40e.h
index 281fd84..9914886 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -298,6 +298,7 @@ struct i40e_pf {
 #define I40E_FLAG_VMDQ_ENABLED BIT_ULL(7)
 #define I40E_FLAG_FDIR_REQUIRES_REINIT BIT_ULL(8)
 #define I40E_FLAG_NEED_LINK_UPDATE BIT_ULL(9)
+#define I40E_FLAG_IWARP_ENABLEDBIT_ULL(10)
 #ifdef I40E_FCOE
 #define I40E_FLAG_FCOE_ENABLED BIT_ULL(11)
 #endif /* I40E_FCOE */
@@ -318,6 +319,12 @@ struct i40e_pf {
 #endif
 #define I40E_FLAG_PORT_ID_VALIDBIT_ULL(28)
 #define I40E_FLAG_DCB_CAPABLE  BIT_ULL(29)
+#define I40E_FLAG_RSS_AQ_CAPABLE   BIT_ULL(31)
+#define I40E_FLAG_HW_ATR_EVICT_CAPABLE BIT_ULL(32)
+#define I40E_FLAG_OUTER_UDP_CSUM_CAPABLE   BIT_ULL(33)
+#define I40E_FLAG_128_QP_RSS_CAPABLE   BIT_ULL(34)
+#define I40E_FLAG_WB_ON_ITR_CAPABLEBIT_ULL(35)
+#define I40E_FLAG_MULTIPLE_TCP_UDP_RSS_PCTYPE  BIT_ULL(38)
 #define I40E_FLAG_VEB_MODE_ENABLED BIT_ULL(40)
 
/* tracks features that get auto disabled by errors */
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 9c96706..3269b05 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -7785,6 +7785,14 @@ static int i40e_sw_init(struct i40e_pf *pf)
I40E_MAX_VF_COUNT);
}
 #endif /* CONFIG_PCI_IOV */
+   if (pf->hw.mac.type == I40E_MAC_X722) {
+   pf->flags |= I40E_FLAG_RSS_AQ_CAPABLE |
+I40E_FLAG_128_QP_RSS_CAPABLE |
+I40E_FLAG_HW_ATR_EVICT_CAPABLE |
+I40E_FLAG_OUTER_UDP_CSUM_CAPABLE |
+I40E_FLAG_WB_ON_ITR_CAPABLE |
+I40E_FLAG_MULTIPLE_TCP_UDP_RSS_PCTYPE;
+   }
pf->eeprom_version = 0xDEAD;
pf->lan_veb = I40E_NO_VEB;
pf->lan_vsi = I40E_NO_VSI;
diff --git a/drivers/net/ethernet/intel/i40evf/i40evf.h 
b/drivers/net/ethernet/intel/i40evf/i40evf.h
index c33c7cc..bd227b3 100644
--- a/drivers/net/ethernet/intel/i40evf/i40evf.h
+++ b/drivers/net/ethernet/intel/i40evf/i40evf.h
@@ -218,11 +218,15 @@ struct i40evf_adapter {
 #define I40EVF_FLAG_PF_COMMS_FAILED  BIT(8)
 #define I40EVF_FLAG_RESET_PENDINGBIT(9)
 #define I40EVF_FLAG_RESET_NEEDED BIT(10)
-/* duplcates for common code */
+#define I40EVF_FLAG_WB_ON_ITR_CAPABLE  BIT(11)
+#define I40EVF_FLAG_OUTER_UDP_CSUM_CAPABLE BIT(12)
+/* duplicates for common code */
 #define I40E_FLAG_FDIR_ATR_ENABLED  0
 #define I40E_FLAG_DCB_ENABLED   0
 #define I40E_FLAG_IN_NETPOLLI40EVF_FLAG_IN_NETPOLL
 #define I40E_FLAG_RX_CSUM_ENABLEDI40EVF_FLAG_RX_CSUM_ENABLED
+#define I40E_FLAG_WB_ON_ITR_CAPABLEI40EVF_FLAG_WB_ON_ITR_CAPABLE
+#define I40E_FLAG_OUTER_UDP_CSUM_CAPABLE   
I40EVF_FLAG_OUTER_UDP_CSUM_CAPABLE
/* flags for admin queue service task */
u32 aq_required;
 #define I40EVF_FLAG_AQ_ENABLE_QUEUES   BIT(0)
@@ -234,6 +238,7 @@ struct i40evf_adapter {
 #define I40EVF_FLAG_AQ_CONFIGURE_QUEUESBIT(6)
 #define I40EVF_FLAG_AQ_MAP_VECTORS BIT(7)
 #define I40EVF_FLAG_AQ_HANDLE_RESETBIT(8)
+#define I40EVF_FLAG_AQ_CONFIGURE_RSS   BIT(9)
 #define I40EVF_FLAG_AQ_GET_CONFIG  BIT(10)
 
/* OS defined structs */
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[net-next 10/15] i40e: Add AQ commands for NVM Update for X722

2015-08-05 Thread Jeff Kirsher

From: Shannon Nelson 

X722 does NVM update via the adminq queue, so we need to add support for
that.

Signed-off-by: Shannon Nelson 
Signed-off-by: Catherine Sullivan 
Tested-by: Jim Young 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_nvm.c | 129 +
 1 file changed, 129 insertions(+)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_nvm.c 
b/drivers/net/ethernet/intel/i40e/i40e_nvm.c
index ce986af..9b83abc 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_nvm.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_nvm.c
@@ -212,6 +212,74 @@ read_nvm_exit:
 }
 
 /**
+ * i40e_read_nvm_aq - Read Shadow RAM.
+ * @hw: pointer to the HW structure.
+ * @module_pointer: module pointer location in words from the NVM beginning
+ * @offset: offset in words from module start
+ * @words: number of words to write
+ * @data: buffer with words to write to the Shadow RAM
+ * @last_command: tells the AdminQ that this is the last command
+ *
+ * Writes a 16 bit words buffer to the Shadow RAM using the admin command.
+ **/
+static i40e_status i40e_read_nvm_aq(struct i40e_hw *hw, u8 module_pointer,
+   u32 offset, u16 words, void *data,
+   bool last_command)
+{
+   i40e_status ret_code = I40E_ERR_NVM;
+   struct i40e_asq_cmd_details cmd_details;
+
+   memset(&cmd_details, 0, sizeof(cmd_details));
+
+   /* Here we are checking the SR limit only for the flat memory model.
+* We cannot do it for the module-based model, as we did not acquire
+* the NVM resource yet (we cannot get the module pointer value).
+* Firmware will check the module-based model.
+*/
+   if ((offset + words) > hw->nvm.sr_size)
+   i40e_debug(hw, I40E_DEBUG_NVM,
+  "NVM write error: offset %d beyond Shadow RAM limit 
%d\n",
+  (offset + words), hw->nvm.sr_size);
+   else if (words > I40E_SR_SECTOR_SIZE_IN_WORDS)
+   /* We can write only up to 4KB (one sector), in one AQ write */
+   i40e_debug(hw, I40E_DEBUG_NVM,
+  "NVM write fail error: tried to write %d words, 
limit is %d.\n",
+  words, I40E_SR_SECTOR_SIZE_IN_WORDS);
+   else if (((offset + (words - 1)) / I40E_SR_SECTOR_SIZE_IN_WORDS)
+!= (offset / I40E_SR_SECTOR_SIZE_IN_WORDS))
+   /* A single write cannot spread over two sectors */
+   i40e_debug(hw, I40E_DEBUG_NVM,
+  "NVM write error: cannot spread over two sectors in 
a single write offset=%d words=%d\n",
+  offset, words);
+   else
+   ret_code = i40e_aq_read_nvm(hw, module_pointer,
+   2 * offset,  /*bytes*/
+   2 * words,   /*bytes*/
+   data, last_command, &cmd_details);
+
+   return ret_code;
+}
+
+/**
+ * i40e_read_nvm_word_aq - Reads Shadow RAM via AQ
+ * @hw: pointer to the HW structure
+ * @offset: offset of the Shadow RAM word to read (0x00 - 0x001FFF)
+ * @data: word read from the Shadow RAM
+ *
+ * Reads one 16 bit word from the Shadow RAM using the GLNVM_SRCTL register.
+ **/
+static i40e_status i40e_read_nvm_word_aq(struct i40e_hw *hw, u16 offset,
+u16 *data)
+{
+   i40e_status ret_code = I40E_ERR_TIMEOUT;
+
+   ret_code = i40e_read_nvm_aq(hw, 0x0, offset, 1, data, true);
+   *data = le16_to_cpu(*(__le16 *)data);
+
+   return ret_code;
+}
+
+/**
  * i40e_read_nvm_word - Reads Shadow RAM
  * @hw: pointer to the HW structure
  * @offset: offset of the Shadow RAM word to read (0x00 - 0x001FFF)
@@ -222,6 +290,8 @@ read_nvm_exit:
 i40e_status i40e_read_nvm_word(struct i40e_hw *hw, u16 offset,
   u16 *data)
 {
+   if (hw->mac.type == I40E_MAC_X722)
+   return i40e_read_nvm_word_aq(hw, offset, data);
return i40e_read_nvm_word_srctl(hw, offset, data);
 }
 
@@ -257,6 +327,63 @@ static i40e_status i40e_read_nvm_buffer_srctl(struct 
i40e_hw *hw, u16 offset,
 }
 
 /**
+ * i40e_read_nvm_buffer_aq - Reads Shadow RAM buffer via AQ
+ * @hw: pointer to the HW structure
+ * @offset: offset of the Shadow RAM word to read (0x00 - 0x001FFF).
+ * @words: (in) number of words to read; (out) number of words actually read
+ * @data: words read from the Shadow RAM
+ *
+ * Reads 16 bit words (data buffer) from the SR using the i40e_read_nvm_aq()
+ * method. The buffer read is preceded by the NVM ownership take
+ * and followed by the release.
+ **/
+static i40e_status i40e_read_nvm_buffer_aq(struct i40e_hw *hw, u16 offset,
+  u16 *words, u16 *data)
+{
+   i40e_status ret_code;
+   u16 read_size = *words;
+   bool last_cmd = false;
+   u16 words_read = 0;
+   u16 i = 0

[net-next 12/15] e1000e: Cosmetic changes

2015-08-05 Thread Jeff Kirsher

From: Raanan Avargil 

1) Replace spaces with tab.
2) Move ich8lan related define to the proper context.

Signed-off-by: Raanan Avargil 
Tested-by: Aaron Brown 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/e1000e/ich8lan.h | 4 ++--
 drivers/net/ethernet/intel/e1000e/regs.h| 5 ++---
 2 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000e/ich8lan.h 
b/drivers/net/ethernet/intel/e1000e/ich8lan.h
index 2645985..34c551e 100644
--- a/drivers/net/ethernet/intel/e1000e/ich8lan.h
+++ b/drivers/net/ethernet/intel/e1000e/ich8lan.h
@@ -106,14 +106,14 @@
 #define E1000_FEXTNVM11_DISABLE_MULR_FIX   0x2000
 
 /* bit24: RXDCTL thresholds granularity: 0 - cache lines, 1 - descriptors */
-#define E1000_RXDCTL_THRESH_UNIT_DESC 0x0100
+#define E1000_RXDCTL_THRESH_UNIT_DESC  0x0100
 
 #define K1_ENTRY_LATENCY   0
 #define K1_MIN_TIME1
 #define NVM_SIZE_MULTIPLIER 4096   /*multiplier for NVMS field */
 #define E1000_FLASH_BASE_ADDR 0xE000   /*offset of NVM access regs */
 #define E1000_CTRL_EXT_NVMVS 0x3   /*NVM valid sector */
-
+#define E1000_TARC0_CB_MULTIQ_3_REQ(1 << 28 | 1 << 29)
 #define PCIE_ICH8_SNOOP_ALLPCIE_NO_SNOOP_ALL
 
 #define E1000_ICH_RAR_ENTRIES  7
diff --git a/drivers/net/ethernet/intel/e1000e/regs.h 
b/drivers/net/ethernet/intel/e1000e/regs.h
index b24e5fe..1d5e0b7 100644
--- a/drivers/net/ethernet/intel/e1000e/regs.h
+++ b/drivers/net/ethernet/intel/e1000e/regs.h
@@ -38,8 +38,8 @@
 #define E1000_FEXTNVM4 0x00024 /* Future Extended NVM 4 - RW */
 #define E1000_FEXTNVM6 0x00010 /* Future Extended NVM 6 - RW */
 #define E1000_FEXTNVM7 0x000E4 /* Future Extended NVM 7 - RW */
-#define E1000_FEXTNVM9 0x5BB4  /* Future Extended NVM 9 - RW */
-#define E1000_FEXTNVM110x5BBC  /* Future Extended NVM 11 - RW */
+#define E1000_FEXTNVM9 0x5BB4  /* Future Extended NVM 9 - RW */
+#define E1000_FEXTNVM110x5BBC  /* Future Extended NVM 11 - RW */
 #define E1000_PCIEANACFG   0x00F18 /* PCIE Analog Config */
 #define E1000_FCT  0x00030 /* Flow Control Type - RW */
 #define E1000_VET  0x00038 /* VLAN Ether Type - RW */
@@ -125,7 +125,6 @@
 (0x054E4 + ((_i - 16) * 8)))
 #define E1000_SHRAL(_i)(0x05438 + ((_i) * 8))
 #define E1000_SHRAH(_i)(0x0543C + ((_i) * 8))
-#define E1000_TARC0_CB_MULTIQ_3_REQ(1 << 28 | 1 << 29)
 #define E1000_TDFH 0x03410 /* Tx Data FIFO Head - RW */
 #define E1000_TDFT 0x03418 /* Tx Data FIFO Tail - RW */
 #define E1000_TDFHS0x03420 /* Tx Data FIFO Head Saved - RW */
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[net-next 06/15] i40e/i40evf: Add support for writeback on ITR feature for X722

2015-08-05 Thread Jeff Kirsher

From: Anjali Singhai Jain 

X722 fixes an issue from X710 where TX descriptor WB would not happen if
the interrupts were disabled. In order for the write backs to happen a
bit needs to be set in the dynamic interrupt control register called
WB_ON_ITR. With this feature, the SW driver need not arm SW interrupts to
work around the issue in X710.

Signed-off-by: Anjali Singhai Jain 
Signed-off-by: Catherine Sullivan 
Tested-by: Jim Young 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e.h|  1 +
 drivers/net/ethernet/intel/i40e/i40e_main.c   |  2 ++
 drivers/net/ethernet/intel/i40e/i40e_txrx.c   | 46 +--
 drivers/net/ethernet/intel/i40e/i40e_txrx.h   |  2 ++
 drivers/net/ethernet/intel/i40evf/i40e_txrx.c | 38 --
 drivers/net/ethernet/intel/i40evf/i40e_txrx.h |  2 ++
 drivers/net/ethernet/intel/i40evf/i40evf.h|  1 +
 7 files changed, 74 insertions(+), 18 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e.h 
b/drivers/net/ethernet/intel/i40e/i40e.h
index 66d0780..0f97883 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -560,6 +560,7 @@ struct i40e_q_vector {
cpumask_t affinity_mask;
struct rcu_head rcu;/* to avoid race with update stats on free */
char name[I40E_INT_NAME_STR_LEN];
+   bool arm_wb_state;
 } cacheline_internodealigned_in_smp;
 
 /* lan device */
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 2e84165..28f547c 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -7071,6 +7071,8 @@ static int i40e_alloc_rings(struct i40e_vsi *vsi)
tx_ring->count = vsi->num_desc;
tx_ring->size = 0;
tx_ring->dcb_tc = 0;
+   if (vsi->back->flags & I40E_FLAG_WB_ON_ITR_CAPABLE)
+   tx_ring->flags = I40E_TXR_FLAGS_WB_ON_ITR;
vsi->tx_rings[i] = tx_ring;
 
rx_ring = &tx_ring[1];
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c 
b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index 330e4ef..7d0a5ea 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -853,15 +853,40 @@ static bool i40e_clean_tx_irq(struct i40e_ring *tx_ring, 
int budget)
  **/
 static void i40e_force_wb(struct i40e_vsi *vsi, struct i40e_q_vector *q_vector)
 {
-   u32 val = I40E_PFINT_DYN_CTLN_INTENA_MASK |
- I40E_PFINT_DYN_CTLN_ITR_INDX_MASK | /* set noitr */
- I40E_PFINT_DYN_CTLN_SWINT_TRIG_MASK |
- I40E_PFINT_DYN_CTLN_SW_ITR_INDX_ENA_MASK;
- /* allow 00 to be written to the index */
-
-   wr32(&vsi->back->hw,
-I40E_PFINT_DYN_CTLN(q_vector->v_idx + vsi->base_vector - 1),
-val);
+   u16 flags = q_vector->tx.ring[0].flags;
+
+   if (flags & I40E_TXR_FLAGS_WB_ON_ITR) {
+   u32 val;
+
+   if (q_vector->arm_wb_state)
+   return;
+
+   val = I40E_PFINT_DYN_CTLN_WB_ON_ITR_MASK;
+
+   wr32(&vsi->back->hw,
+I40E_PFINT_DYN_CTLN(q_vector->v_idx +
+vsi->base_vector - 1),
+val);
+   q_vector->arm_wb_state = true;
+   } else if (vsi->back->flags & I40E_FLAG_MSIX_ENABLED) {
+   u32 val = I40E_PFINT_DYN_CTLN_INTENA_MASK |
+ I40E_PFINT_DYN_CTLN_ITR_INDX_MASK | /* set noitr */
+ I40E_PFINT_DYN_CTLN_SWINT_TRIG_MASK |
+ I40E_PFINT_DYN_CTLN_SW_ITR_INDX_ENA_MASK;
+ /* allow 00 to be written to the index */
+
+   wr32(&vsi->back->hw,
+I40E_PFINT_DYN_CTLN(q_vector->v_idx +
+vsi->base_vector - 1), val);
+   } else {
+   u32 val = I40E_PFINT_DYN_CTL0_INTENA_MASK |
+ I40E_PFINT_DYN_CTL0_ITR_INDX_MASK | /* set noitr */
+ I40E_PFINT_DYN_CTL0_SWINT_TRIG_MASK |
+ I40E_PFINT_DYN_CTL0_SW_ITR_INDX_ENA_MASK;
+   /* allow 00 to be written to the index */
+
+   wr32(&vsi->back->hw, I40E_PFINT_DYN_CTL0, val);
+   }
 }
 
 /**
@@ -1918,6 +1943,9 @@ int i40e_napi_poll(struct napi_struct *napi, int budget)
return budget;
}
 
+   if (vsi->back->flags & I40E_TXR_FLAGS_WB_ON_ITR)
+   q_vector->arm_wb_state = false;
+
/* Work is done so exit the polling mode and re-enable the interrupt */
napi_complete(napi);
if (vsi->back->flags & I40E_FLAG_MSIX_ENABLED) {
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.h 
b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
index 8b618d0..0e40994 100644
--- a/drivers/net/ethernet/

[net-next 01/15] i40e/i40evf: Add device ids for X722

2015-08-05 Thread Jeff Kirsher

From: Anjali Singhai Jain 

Adding device ids for new hardware X722

Signed-off-by: Anjali Singhai Jain 
Signed-off-by: Catherine Sullivan 
Tested-by: Jim Young 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_common.c   | 10 ++
 drivers/net/ethernet/intel/i40e/i40e_main.c |  3 +++
 drivers/net/ethernet/intel/i40e/i40e_type.h | 10 +-
 drivers/net/ethernet/intel/i40evf/i40e_common.c |  9 +
 drivers/net/ethernet/intel/i40evf/i40e_type.h   | 10 +-
 drivers/net/ethernet/intel/i40evf/i40evf_main.c |  1 +
 6 files changed, 41 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_common.c 
b/drivers/net/ethernet/intel/i40e/i40e_common.c
index 167ca0d..11ec264 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_common.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_common.c
@@ -54,6 +54,15 @@ static i40e_status i40e_set_mac_type(struct i40e_hw *hw)
case I40E_DEV_ID_20G_KR2:
hw->mac.type = I40E_MAC_XL710;
break;
+   case I40E_DEV_ID_SFP_X722:
+   case I40E_DEV_ID_1G_BASE_T_X722:
+   case I40E_DEV_ID_10G_BASE_T_X722:
+   hw->mac.type = I40E_MAC_X722;
+   break;
+   case I40E_DEV_ID_X722_VF:
+   case I40E_DEV_ID_X722_VF_HV:
+   hw->mac.type = I40E_MAC_X722_VF;
+   break;
case I40E_DEV_ID_VF:
case I40E_DEV_ID_VF_HV:
hw->mac.type = I40E_MAC_VF;
@@ -769,6 +778,7 @@ i40e_status i40e_init_shared_code(struct i40e_hw *hw)
 
switch (hw->mac.type) {
case I40E_MAC_XL710:
+   case I40E_MAC_X722:
break;
default:
return I40E_ERR_DEVICE_NOT_SUPPORTED;
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 857d294..9c96706 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -76,6 +76,9 @@ static const struct pci_device_id i40e_pci_tbl[] = {
{PCI_VDEVICE(INTEL, I40E_DEV_ID_QSFP_C), 0},
{PCI_VDEVICE(INTEL, I40E_DEV_ID_10G_BASE_T), 0},
{PCI_VDEVICE(INTEL, I40E_DEV_ID_20G_KR2), 0},
+   {PCI_VDEVICE(INTEL, I40E_DEV_ID_SFP_X722), 0},
+   {PCI_VDEVICE(INTEL, I40E_DEV_ID_1G_BASE_T_X722), 0},
+   {PCI_VDEVICE(INTEL, I40E_DEV_ID_10G_BASE_T_X722), 0},
/* required last entry */
{0, }
 };
diff --git a/drivers/net/ethernet/intel/i40e/i40e_type.h 
b/drivers/net/ethernet/intel/i40e/i40e_type.h
index a20128b..778266f 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_type.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_type.h
@@ -47,6 +47,11 @@
 #define I40E_DEV_ID_20G_KR20x1587
 #define I40E_DEV_ID_VF 0x154C
 #define I40E_DEV_ID_VF_HV  0x1571
+#define I40E_DEV_ID_SFP_X722   0x37D0
+#define I40E_DEV_ID_1G_BASE_T_X722 0x37D1
+#define I40E_DEV_ID_10G_BASE_T_X7220x37D2
+#define I40E_DEV_ID_X722_VF0x37CD
+#define I40E_DEV_ID_X722_VF_HV 0x37D9
 
 #define i40e_is_40G_device(d)  ((d) == I40E_DEV_ID_QSFP_A  || \
 (d) == I40E_DEV_ID_QSFP_B  || \
@@ -120,6 +125,8 @@ enum i40e_mac_type {
I40E_MAC_X710,
I40E_MAC_XL710,
I40E_MAC_VF,
+   I40E_MAC_X722,
+   I40E_MAC_X722_VF,
I40E_MAC_GENERIC,
 };
 
@@ -502,7 +509,8 @@ struct i40e_hw {
 
 static inline bool i40e_is_vf(struct i40e_hw *hw)
 {
-   return hw->mac.type == I40E_MAC_VF;
+   return (hw->mac.type == I40E_MAC_VF ||
+   hw->mac.type == I40E_MAC_X722_VF);
 }
 
 struct i40e_driver_version {
diff --git a/drivers/net/ethernet/intel/i40evf/i40e_common.c 
b/drivers/net/ethernet/intel/i40evf/i40e_common.c
index 56c7e75..eb54e8d 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_common.c
+++ b/drivers/net/ethernet/intel/i40evf/i40e_common.c
@@ -54,6 +54,15 @@ i40e_status i40e_set_mac_type(struct i40e_hw *hw)
case I40E_DEV_ID_20G_KR2:
hw->mac.type = I40E_MAC_XL710;
break;
+   case I40E_DEV_ID_SFP_X722:
+   case I40E_DEV_ID_1G_BASE_T_X722:
+   case I40E_DEV_ID_10G_BASE_T_X722:
+   hw->mac.type = I40E_MAC_X722;
+   break;
+   case I40E_DEV_ID_X722_VF:
+   case I40E_DEV_ID_X722_VF_HV:
+   hw->mac.type = I40E_MAC_X722_VF;
+   break;
case I40E_DEV_ID_VF:
case I40E_DEV_ID_VF_HV:
hw->mac.type = I40E_MAC_VF;
diff --git a/drivers/net/ethernet/intel/i40evf/i40e_type.h 
b/drivers/net/ethernet/intel/i40evf/i40e_type.h
index 4ba9a01..c50536b 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_type.h
+++ b/drivers/net/ethernet/intel/i40evf

[net-next 00/15][pull request] Intel Wired LAN Driver Updates 2015-08-05

2015-08-05 Thread Jeff Kirsher

This series contains updates to i40e, i40evf and e1000e.

Anjali adds support for x772 devices to i40e and i40evf.  With the added
support, x772 supports offloading of the outer UDP transmit and receive
checksum for tunneled packets.  Also supports evicting ATR filters in the
hardware, so update the driver with this new feature set.

Raanan provides several fixes for e1000e, first rectifies the Energy
Efficient Ethernet in Sx code so that it only applies to parts that
actually support EEE in Sx.  Fix whitespace and moved ICH8 related define
to the proper context.  Fixed the ASPM locking which was reported by
Bjorn Helgaas.  Fix a workaround implementation for systime which could
experience a large non-linear increment of the systime value when
checking for overflow.

The following are changes since commit 9dc20a649609c95ce7c5ac4282656ba627b67d49:
  Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
and are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue master

Anjali Singhai Jain (9):
  i40e/i40evf: Add device ids for X722
  i40e/i40evf: Add flags for X722 capabilities
  i40e/i40evf: Update FW API with X722 support
  i40e/i40evf: Update register.h file for X722
  i40e/i40evf: RSS changes for X722
  i40e/i40evf: Add support for writeback on ITR feature for X722
  i40e/i40evf: Add TX/RX outer UDP checksum support for X722
  i40e: Add IWARP support for X722
  i40e/i40evf: Add ATR HW eviction support for X722

Raanan Avargil (5):
  e1000e: Fix EEE in Sx implementation
  e1000e: Cosmetic changes
  e1000e: Fix incorrect ASPM locking
  e1000e: Fix tight loop implementation of systime read algorithm
  e1000e: Increase driver version number

Shannon Nelson (1):
  i40e: Add AQ commands for NVM Update for X722

 drivers/net/ethernet/intel/e1000e/ich8lan.h|4 +-
 drivers/net/ethernet/intel/e1000e/netdev.c |   64 +-
 drivers/net/ethernet/intel/e1000e/regs.h   |5 +-
 drivers/net/ethernet/intel/i40e/i40e.h |   15 +-
 drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h  |   48 +
 drivers/net/ethernet/intel/i40e/i40e_common.c  |  173 ++
 drivers/net/ethernet/intel/i40e/i40e_main.c|  181 +-
 drivers/net/ethernet/intel/i40e/i40e_nvm.c |  129 ++
 drivers/net/ethernet/intel/i40e/i40e_prototype.h   |   11 +
 drivers/net/ethernet/intel/i40e/i40e_register.h| 1931 +++-
 drivers/net/ethernet/intel/i40e/i40e_txrx.c|   72 +-
 drivers/net/ethernet/intel/i40e/i40e_txrx.h|   16 +
 drivers/net/ethernet/intel/i40e/i40e_type.h|   39 +-
 drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c |   11 +-
 .../net/ethernet/intel/i40evf/i40e_adminq_cmd.h|   49 +-
 drivers/net/ethernet/intel/i40evf/i40e_common.c|  172 ++
 drivers/net/ethernet/intel/i40evf/i40e_prototype.h |   11 +
 drivers/net/ethernet/intel/i40evf/i40e_register.h  |   62 +-
 drivers/net/ethernet/intel/i40evf/i40e_txrx.c  |   51 +-
 drivers/net/ethernet/intel/i40evf/i40e_txrx.h  |   16 +
 drivers/net/ethernet/intel/i40evf/i40e_type.h  |   35 +-
 drivers/net/ethernet/intel/i40evf/i40evf.h |   10 +-
 drivers/net/ethernet/intel/i40evf/i40evf_main.c|  167 +-
 23 files changed, 3136 insertions(+), 136 deletions(-)

-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] sky2: Add module parameter for passing the MAC address

2015-08-05 Thread Stephen Hemminger

Something like this:

Subject: [PATCH net-next] sky2: use random address if EEPROM is bad

On some embedded systems the EEPROM does not contain a valid MAC address.
In that case it is better to fallback to a generated mac address and
let init scripts fix the value later.

Reported-by: Liviu Dudau 
Signed-off-by: Stephen Hemminger 


--- a/drivers/net/ethernet/marvell/sky2.c   2015-05-21 15:13:03.621126050 
-0700
+++ b/drivers/net/ethernet/marvell/sky2.c   2015-08-05 16:12:38.734534467 
-0700
@@ -4819,6 +4819,16 @@ static struct net_device *sky2_init_netd
memcpy_fromio(dev->dev_addr, hw->regs + B2_MAC_1 + port * 8,
  ETH_ALEN);
 
+   /* if the address is invalid, use a random value */
+   if (!is_valid_ether_addr(dev->dev_addr)) {
+   struct sockaddr sa = { AF_UNSPEC };
+
+   netdev_warn(dev,
+"Invalid MAC address defaulting to random\n");
+   sky2_set_mac_address(dev, &sa);
+   dev->addr_assign_type |= NET_ADDR_RANDOM;
+   }
+
return dev;
 }
 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCHv2 net-next 5/9] openvswitch: Add conntrack action

2015-08-05 Thread Pravin Shelar

On Tue, Aug 4, 2015 at 9:49 PM, Joe Stringer  wrote:
> Expose the kernel connection tracker via OVS. Userspace components can
> make use of the "ct()" action, followed by "recirculate", to populate
> the conntracking state in the OVS flow key, and subsequently match on
> that state.
>
> Example ODP flows allowing traffic from 1->2, only replies from 2->1:
> in_port=1,tcp,action=ct(commit,zone=1),2
> in_port=2,ct_state=-trk,tcp,action=ct(zone=1),recirc(1)
> recirc_id=1,in_port=2,ct_state=+trk+est-new,tcp,action=1
>
> IP fragments are handled by transparently assembling them as part of the
> ct action. The maximum received unit (MRU) size is tracked so that
> refragmentation can occur during output.
>
> IP frag handling contributed by Andy Zhou.
>
> Signed-off-by: Joe Stringer 
> Signed-off-by: Justin Pettit 
> Signed-off-by: Andy Zhou 
> ---
> This can be tested with the corresponding userspace component here:
> https://www.github.com/justinpettit/openvswitch conntrack
>
> v2: Don't take references to devs or dsts in output path.
> Shift ovs_ct_init()/ovs_ct_exit() into this patch
> Handle output case where flow key is invalidated
> Store the entire L2 header to apply to fragments
> Various minor simplifications
> Improve comments/logs
> Style fixes
> Rebase
> ---
>  include/uapi/linux/openvswitch.h |  41 
>  net/openvswitch/Kconfig  |  11 +
>  net/openvswitch/Makefile |   2 +
>  net/openvswitch/actions.c| 154 -
>  net/openvswitch/conntrack.c  | 475 
> +++
>  net/openvswitch/conntrack.h  |  97 
>  net/openvswitch/datapath.c   |  73 --
>  net/openvswitch/datapath.h   |   8 +
>  net/openvswitch/flow.c   |   3 +
>  net/openvswitch/flow.h   |   6 +
>  net/openvswitch/flow_netlink.c   |  72 --
>  net/openvswitch/flow_netlink.h   |   4 +-
>  net/openvswitch/vport.c  |   1 +
>  13 files changed, 910 insertions(+), 37 deletions(-)
>  create mode 100644 net/openvswitch/conntrack.c
>  create mode 100644 net/openvswitch/conntrack.h
>

I got sparse warning:

net/openvswitch/actions.c:634:1: warning: symbol 'ovs_dst_get_mtu' was
not declared. Should it be static?

net/openvswitch/actions.c:703:17: warning: cast from restricted __be16

net/openvswitch/actions.c:703:17: warning: incorrect type in argument
1 (different base types)

net/openvswitch/actions.c:703:17:expected unsigned short
[unsigned] [usertype] val

net/openvswitch/actions.c:703:17:got restricted __be16 [usertype] ethertype

net/openvswitch/actions.c:703:17: warning: cast from restricted __be16

net/openvswitch/actions.c:703:17: warning: cast from restricted __be16



> -static void do_output(struct datapath *dp, struct sk_buff *skb, int out_port)
> +static int ovs_vport_output(struct sock *sock, struct sk_buff *skb)
> +{
> +   struct ovs_frag_data *data = this_cpu_ptr(&ovs_frag_data_storage);
> +   struct vport *vport = data->vport;
> +
> +   if (skb_cow_head(skb, data->l2_len) < 0)
> +   return -ENOMEM;
> +
Need to free skb here.

> +   skb->_skb_refdst = data->dst;
I think we need to clone dst if there are multiple fragments.
Can you test this code with vxlan flow based tunnels?

> +   *OVS_CB(skb) = data->cb;
> +
> +   /* Reconstruct the MAC header.  */
> +   skb_push(skb, data->l2_len);
> +   memcpy(skb->data, &data->l2_data, data->l2_len);
> +   skb_reset_mac_header(skb);
> +   skb->protocol = eth_hdr(skb)->h_proto;
why do we need to restore skb->protocol?
> +   skb->vlan_tci = 0;
> +
Why is vlan_tci set to zero.

> +   ovs_vport_send(vport, skb);
> +   return 0;
> +}
> +
> +unsigned int
> +ovs_dst_get_mtu(const struct dst_entry *dst)
> +{
> +   return dst->dev->mtu;
> +}
> +
...

> +static void ovs_fragment(struct vport *vport, struct sk_buff *skb,
> +unsigned int mru, __be16 ethertype)
> +{
> +   if (skb_network_offset(skb) > MAX_L2_LEN) {
> +   OVS_NLERR(1, "L2 header too long to fragment");
> +   return;
> +   }
> +
> +   if (ethertype == htons(ETH_P_IP)) {
> +   struct dst_entry ovs_dst;
> +
> +   prepare_frag(vport, skb);
> +   dst_init(&ovs_dst, &ovs_dst_ops, NULL, 1,
> +DST_OBSOLETE_NONE, DST_NOCOUNT);
> +   ovs_dst.dev = vport->dev;
> +
> +   skb_dst_set_noref(skb, &ovs_dst);
> +   IPCB(skb)->frag_max_size = mru;
> +
> +   ip_do_fragment(skb->sk, skb, ovs_vport_output);
> +   } else if (ethertype == htons(ETH_P_IPV6)) {
> +   const struct nf_ipv6_ops *v6ops = nf_get_ipv6_ops();
> +   struct rt6_info ovs_rt;
> +
> +   if (!v6ops) {
> +   kfree_skb(skb);
> +   return;
> +   }
> +
> +   prepare_frag(vport, skb);
> +   memset(&ovs_rt, 0,

Re: [PATCHv2 net-next 2/9] openvswitch: Move MASKED* macros to datapath.h

2015-08-05 Thread Pravin Shelar

On Tue, Aug 4, 2015 at 9:49 PM, Joe Stringer  wrote:
> This will allow the ovs-conntrack code to reuse these macros.
>
> Signed-off-by: Joe Stringer 
> Acked-by: Thomas Graf 

Acked-by: Pravin B Shelar 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCHv2 net-next 1/9] openvswitch: Serialize acts with original netlink len

2015-08-05 Thread Pravin Shelar

On Tue, Aug 4, 2015 at 9:49 PM, Joe Stringer  wrote:
> Previously, we used the kernel-internal netlink actions length to
> calculate the size of messages to serialize back to userspace.
> However,the sw_flow_actions may not be formatted exactly the same as the
> actions on the wire, so store the original actions length when
> de-serializing and re-use the original length when serializing.
>
> Signed-off-by: Joe Stringer 
> Acked-by: Thomas Graf 
> ---
>  net/openvswitch/datapath.c | 2 +-
>  net/openvswitch/flow.h | 1 +
>  net/openvswitch/flow_netlink.c | 1 +
>  3 files changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
> index ffe984f..d5b5473 100644
> --- a/net/openvswitch/datapath.c
> +++ b/net/openvswitch/datapath.c
> @@ -713,7 +713,7 @@ static size_t ovs_flow_cmd_msg_size(const struct 
> sw_flow_actions *acts,
>
> /* OVS_FLOW_ATTR_ACTIONS */
> if (should_fill_actions(ufid_flags))
> -   len += nla_total_size(acts->actions_len);
> +   len += nla_total_size(acts->orig_len);
>
> return len
> + nla_total_size(sizeof(struct ovs_flow_stats)) /* 
> OVS_FLOW_ATTR_STATS */
> diff --git a/net/openvswitch/flow.h b/net/openvswitch/flow.h
> index b62cdb3..082a87b 100644
> --- a/net/openvswitch/flow.h
> +++ b/net/openvswitch/flow.h
> @@ -144,6 +144,7 @@ struct sw_flow_id {
>
>  struct sw_flow_actions {
> struct rcu_head rcu;
> +   size_t orig_len;/* From flow_cmd_new netlink actions size */
> u32 actions_len;
> struct nlattr actions[];
>  };
> diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
> index a6eb77a..d536fb7 100644
> --- a/net/openvswitch/flow_netlink.c
> +++ b/net/openvswitch/flow_netlink.c
> @@ -1545,6 +1545,7 @@ static struct sw_flow_actions 
> *nla_alloc_flow_actions(int size, bool log)
> return ERR_PTR(-ENOMEM);
>
> sfa->actions_len = 0;
> +   sfa->orig_len = size;

This is getting updated every time datapath action buffer is expanded
(ref: reserve_sfa_size()). So it does not keep original userspace
action length.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCHv2 net-next 3/9] ipv6: Export nf_ct_frag6_gather()

2015-08-05 Thread Pravin Shelar

On Tue, Aug 4, 2015 at 9:49 PM, Joe Stringer  wrote:
> Signed-off-by: Joe Stringer 
> Acked-by: Thomas Graf 
> ---
Acked-by: Pravin B Shelar 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next 1/1] lan78xx: Fix Smatch warnings

2015-08-05 Thread Woojung.Huh

Fix Smatch warnings.
- lan78xx.c:2282 tx_complete() warn: variable dereferenced before check 'skb' 
(see line 2249)
- lan78xx.c:2885 lan78xx_bh() info: ignoring unreachable code.
- lan78xx.c:3159 lan78xx_probe() info: ignoring unreachable code.

Reported-by: Dan Carpenter 

Signed-off-by: Woojung Huh 
---
 drivers/net/usb/lan78xx.c | 52 ++-
 1 file changed, 24 insertions(+), 28 deletions(-)

diff --git a/drivers/net/usb/lan78xx.c b/drivers/net/usb/lan78xx.c
index ec8bd34..3ac405f 100644
--- a/drivers/net/usb/lan78xx.c
+++ b/drivers/net/usb/lan78xx.c
@@ -291,7 +291,7 @@ static int lan78xx_read_reg(struct lan78xx_net *dev, u32 
index, u32 *data)
u32 *buf = kmalloc(sizeof(u32), GFP_KERNEL);
int ret;
 
-   BUG_ON(!dev);
+   WARN_ON_ONCE(!dev);
 
if (!buf)
return -ENOMEM;
@@ -319,7 +319,7 @@ static int lan78xx_write_reg(struct lan78xx_net *dev, u32 
index, u32 data)
u32 *buf = kmalloc(sizeof(u32), GFP_KERNEL);
int ret;
 
-   BUG_ON(!dev);
+   WARN_ON_ONCE(!dev);
 
if (!buf)
return -ENOMEM;
@@ -351,9 +351,9 @@ static int lan78xx_read_stats(struct lan78xx_net *dev,
u32 *src;
u32 *dst;
 
-   BUG_ON(!dev);
-   BUG_ON(!data);
-   BUG_ON(sizeof(struct lan78xx_statstage) != 0xBC);
+   WARN_ON_ONCE(!dev);
+   WARN_ON_ONCE(!data);
+   WARN_ON_ONCE(sizeof(struct lan78xx_statstage) != 0xBC);
 
stats = kmalloc(sizeof(*stats), GFP_KERNEL);
if (!stats)
@@ -687,8 +687,8 @@ static int lan78xx_read_raw_eeprom(struct lan78xx_net *dev, 
u32 offset,
u32 val;
int i, ret;
 
-   BUG_ON(!dev);
-   BUG_ON(!data);
+   WARN_ON_ONCE(!dev);
+   WARN_ON_ONCE(!data);
 
ret = lan78xx_eeprom_confirm_not_busy(dev);
if (ret)
@@ -737,8 +737,8 @@ static int lan78xx_write_raw_eeprom(struct lan78xx_net 
*dev, u32 offset,
u32 val;
int i, ret;
 
-   BUG_ON(!dev);
-   BUG_ON(!data);
+   WARN_ON_ONCE(!dev);
+   WARN_ON_ONCE(!data);
 
ret = lan78xx_eeprom_confirm_not_busy(dev);
if (ret)
@@ -2221,19 +2221,19 @@ static enum skb_state defer_bh(struct lan78xx_net *dev, 
struct sk_buff *skb,
old_state = entry->state;
entry->state = state;
if (!list->prev)
-   BUG_ON(!list->prev);
+   WARN_ON_ONCE(!list->prev);
if (!list->next)
-   BUG_ON(!list->next);
+   WARN_ON_ONCE(!list->next);
if (!skb->prev || !skb->next)
-   BUG_ON(true);
+   WARN_ON_ONCE(true);
 
__skb_unlink(skb, list);
spin_unlock(&list->lock);
spin_lock(&dev->done.lock);
if (!dev->done.prev)
-   BUG_ON(!dev->done.prev);
+   WARN_ON_ONCE(!dev->done.prev);
if (!dev->done.next)
-   BUG_ON(!dev->done.next);
+   WARN_ON_ONCE(!dev->done.next);
 
__skb_queue_tail(&dev->done, skb);
if (skb_queue_len(&dev->done) == 1)
@@ -2279,8 +2279,7 @@ static void tx_complete(struct urb *urb)
 
usb_autopm_put_interface_async(dev->intf);
 
-   if (skb)
-   defer_bh(dev, skb, &dev->txq, tx_done);
+   defer_bh(dev, skb, &dev->txq, tx_done);
 }
 
 static void lan78xx_queue_skb(struct sk_buff_head *list,
@@ -2295,13 +2294,15 @@ static void lan78xx_queue_skb(struct sk_buff_head *list,
 netdev_tx_t lan78xx_start_xmit(struct sk_buff *skb, struct net_device *net)
 {
struct lan78xx_net *dev = netdev_priv(net);
+   struct sk_buff *skb2 = NULL;
 
-   if (skb)
+   if (skb) {
skb_tx_timestamp(skb);
+   skb2 = lan78xx_tx_prep(dev, skb, GFP_ATOMIC);
+   }
 
-   skb = lan78xx_tx_prep(dev, skb, GFP_ATOMIC);
-   if (skb) {
-   skb_queue_tail(&dev->txq_pend, skb);
+   if (skb2) {
+   skb_queue_tail(&dev->txq_pend, skb2);
 
if (skb_queue_len(&dev->txq_pend) > 10)
netif_stop_queue(net);
@@ -2749,7 +2750,7 @@ static void lan78xx_tx_bh(struct lan78xx_net *dev)
pos += roundup(skb2->len, sizeof(u32));
dev_kfree_skb(skb2);
} else {
-   BUG_ON(true);
+   WARN_ON_ONCE(true);
}
}
 
@@ -2859,9 +2860,9 @@ static void lan78xx_bh(unsigned long param)
struct skb_data *entry;
 
if (!dev->done.prev)
-   BUG_ON(!dev->done.prev);
+   WARN_ON_ONCE(!dev->done.prev);
if (!dev->done.next)
-   BUG_ON(!dev->done.next);
+   WARN_ON_ONCE(!dev->done.next);
 
while ((skb = skb_dequeue(&dev->done))) {
entry = (struct skb_data *)(skb->cb);
@@ -2882,10 +2883,6 @@ static void lan78xx_bh(unsigned long param)
netdev_dbg(dev->net, "skb state %d\n", entry->state);

Re: [PATCH] sky2: Add module parameter for passing the MAC address

2015-08-05 Thread Francois Romieu

Liviu Dudau  :
> On Wed, Aug 05, 2015 at 05:40:57PM +0100, Stephen Hemminger wrote:
[...]
> > Yes, I can see that this can be a real problem, and other drivers
> > solve the problem. The standard method is to assign a random mac address
> > (and then let scripts overwrite) rather than introducing module parameter.
> > Module parameters are discouraged because they are device specific.
> > 
> 
> I agree. However, in my case, the boards people have assigned MAC addresses
> to the chip, they just didn't built the board in such a way as to allow one
> to store that MAC address in a permanent way :( And no, I can't use the DT
> because the chip is actually on the PCIe bus.
> 
> Even with the generation of a random address, it still needs to be copied
> into the device, so I would guess that a version of the patch I've sent is
> still relevant?

Assuming a random address is generated, could you elaborate what is needed
that sky2_set_mac_address fails to provide ?

-- 
Ueimor
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] net: Unbreak resetting default values for tcp_wmem/udp_wmem_min

2015-08-05 Thread Calvin Owens

Commit 8133534c760d4083 ("net: limit tcp/udp rmem/wmem to
SOCK_{RCV,SND}BUF_MIN") modified four sysctls to enforce that the values
written to them are not less than SOCK_MIN_{RCV,SND}BUF.

This change is fine for tcp_rmem and udp_rmem_min, since SOCK_MIN_RCVBUF
is equal to equal to TCP_SKB_MIN_TRUESIZE. But it breaks tcp_wmem and
udp_wmem_min for previously valid values because SOCK_MIN_SNDBUF is
(2 * TCP_SKB_MIN_TRUESIZE), which ends up being greater than 4KB.

Thus, 4096 is no longer accepted as a valid value, despite still being
the default for udp_wmem_min, and for 'min' in tcp_wmem. A huge number
of sysctl configurations at FB use 4096 as 'min', so this change breaks
all of them.

This patch changes the sysctls to simply enforce that the value written
is greater than or equal to the default value of SK_MEM_QUANTUM.

Fixes: 8133534c760d4083 ("net: limit tcp/udp rmem/wmem to SOCK_MIN...")
Signed-off-by: Calvin Owens 
---
 net/ipv4/sysctl_net_ipv4.c | 11 +--
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index 433231c..a214b6a 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -41,8 +41,7 @@ static int tcp_syn_retries_min = 1;
 static int tcp_syn_retries_max = MAX_TCP_SYNCNT;
 static int ip_ping_group_range_min[] = { 0, 0 };
 static int ip_ping_group_range_max[] = { GID_T_MAX, GID_T_MAX };
-static int min_sndbuf = SOCK_MIN_SNDBUF;
-static int min_rcvbuf = SOCK_MIN_RCVBUF;
+static int min_buf = SK_MEM_QUANTUM;
 
 /* Update system visible IP port range */
 static void set_local_port_range(struct net *net, int range[2])
@@ -530,7 +529,7 @@ static struct ctl_table ipv4_table[] = {
.maxlen = sizeof(sysctl_tcp_wmem),
.mode   = 0644,
.proc_handler   = proc_dointvec_minmax,
-   .extra1 = &min_sndbuf,
+   .extra1 = &min_buf,
},
{
.procname   = "tcp_notsent_lowat",
@@ -545,7 +544,7 @@ static struct ctl_table ipv4_table[] = {
.maxlen = sizeof(sysctl_tcp_rmem),
.mode   = 0644,
.proc_handler   = proc_dointvec_minmax,
-   .extra1 = &min_rcvbuf,
+   .extra1 = &min_buf,
},
{
.procname   = "tcp_app_win",
@@ -758,7 +757,7 @@ static struct ctl_table ipv4_table[] = {
.maxlen = sizeof(sysctl_udp_rmem_min),
.mode   = 0644,
.proc_handler   = proc_dointvec_minmax,
-   .extra1 = &min_rcvbuf,
+   .extra1 = &min_buf,
},
{
.procname   = "udp_wmem_min",
@@ -766,7 +765,7 @@ static struct ctl_table ipv4_table[] = {
.maxlen = sizeof(sysctl_udp_wmem_min),
.mode   = 0644,
.proc_handler   = proc_dointvec_minmax,
-   .extra1 = &min_sndbuf,
+   .extra1 = &min_buf,
},
{ }
 };
-- 
1.8.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH RFC] ipvlan: Problems with ipvlan and IPv6

2015-08-05 Thread Tom Herbert

Just doing:

ip link add name ipvl0 link eth0 type ipvlan
ip -6 addr add :0:0:1::0:1:1/128 dev ipvl0
ip addr add 192.168.1.1/32 dev iplv0

Before patch:

  ping :0:0:1::0:1:1

   - No response, tcpdump on ipvl0 shows two copies of each ICMP echo
 request.

  ping 192.168.1.1

   - Works, tcpdump on ipvl0 shows nothing

With patch applied:

  ping :0:0:1::0:1:1

   - Works, tcpdump on ipvl0 shows nothing

  ping 192.168.1.1

   - Works, tcpdump on ipvl0 shows nothing

Some other problems I'm seeing
-  Occurrence of "Dead loop on virtual device ipvl0, fix it urgently!"

   Running following script several times seems to get to the error.
   Once in this state machine is pretty locked up:

 rmmod ipvlan

 for i  in `seq 0 199`;
 do
ip link add name ipvl$i link eth0 type ipvlan
addr=`echo $i | awk '{printf("%04x", $1)}'`
ip -6 addr add :0:0:1::0:1:$addr/128 dev ipvl$i
ifconfig ipvl$i up
 done

- Seeing "Dropped {multi|broad}cast of type= [86dd]"

  Looks like this may be from ND messages. Might be innocuous I
  suppose, but it is annoying to see in logs.

Signed-off-by: Tom Herbert 
---
 drivers/net/ipvlan/ipvlan_core.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ipvlan/ipvlan_core.c b/drivers/net/ipvlan/ipvlan_core.c
index 207f62e..0b68538 100644
--- a/drivers/net/ipvlan/ipvlan_core.c
+++ b/drivers/net/ipvlan/ipvlan_core.c
@@ -277,8 +277,8 @@ static int ipvlan_rcv_frame(struct ipvl_addr *addr, struct 
sk_buff *skb,
skb->pkt_type = PACKET_HOST;
 
if (local) {
-   if (dev_forward_skb(ipvlan->dev, skb) == NET_RX_SUCCESS)
-   success = true;
+   netif_receive_skb(skb);
+   success = true;
} else {
ret = RX_HANDLER_ANOTHER;
success = true;
@@ -384,7 +384,7 @@ static int ipvlan_process_v6_outbound(struct sk_buff *skb)
struct dst_entry *dst;
int err, ret = NET_XMIT_DROP;
struct flowi6 fl6 = {
-   .flowi6_iif = skb->dev->ifindex,
+   .flowi6_oif = dev_get_iflink(dev),
.daddr = ip6h->daddr,
.saddr = ip6h->saddr,
.flowi6_flags = FLOWI_FLAG_ANYSRC,
-- 
1.8.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 09/10] net: Use VRF device index for socket lookups

2015-08-05 Thread David Ahern

Hi Tom:

On 8/5/15 12:32 PM, Tom Herbert wrote:

On Wed, Aug 5, 2015 at 10:14 AM, David Ahern  wrote:

>The intent of the VRF device is to leverage the existing SO_BINDTODEVICE
>as a means of creating L3 domains. Since sockets are expected to be bound
>to the VRF device the index of the master device needs to be used for
>socket lookups.
>

This patch set seems awfully invasive at the socket layer. Isn't there
anyway this functionality be contained in the routing layer and
sockets use existing API?

This patch is a leftover from earlier versions. It is no longer needed. 
Will drop for v5.

David
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: rtnl_mutex deadlock?

2015-08-05 Thread Daniel Borkmann


On 08/05/2015 10:44 AM, Linus Torvalds wrote:

On Wed, Aug 5, 2015 at 9:43 AM, Jiri Pirko  wrote:


Indeed. Most probably, NETLINK_CB(skb).portid got zeroed.

Linus, are you able to reproduce this or is it a one-time issue?


I don't think I'm able to reproduce this, it's happened only once so far.


Here's a theory and patch below. Herbert, Thomas, does this make any
sense to you resp. sound plausible? ;)

I'm not quite sure what's best to return from here, i.e. whether we
propagate -ENOMEM or instead retry over and over again hoping that the
rehashing completed (and no new rehashing started in the mean time) ...

The rehashing could take quite some time on large hashtables and given
we can also fail with -ENOMEM from rhashtable_insert_rehash() when we
cannot allocate a bucket table, it's probably okay to go with -ENOMEM?


[PATCH net] netlink, rhashtable: fix deadlock when grabbing rtnl_mutex

Linus reports the following deadlock on rtnl_mutex; triggered only
once so far:

[12236.694209] NetworkManager  D 00013b80 0  1047  1 0x
[12236.694218]  88003f902640  815d15a9 
0018
[12236.694224]  880119538000 88003f902640 81a8ff84 

[12236.694230]  81a8ff88 880119c47f00 815d133a 
81a8ff80
[12236.694235] Call Trace:
[12236.694250]  [] ? schedule_preempt_disabled+0x9/0x10
[12236.694257]  [] ? schedule+0x2a/0x70
[12236.694263]  [] ? schedule_preempt_disabled+0x9/0x10
[12236.694271]  [] ? __mutex_lock_slowpath+0x7f/0xf0
[12236.694280]  [] ? mutex_lock+0x16/0x30
[12236.694291]  [] ? rtnetlink_rcv+0x10/0x30
[12236.694299]  [] ? netlink_unicast+0xfb/0x180
[12236.694309]  [] ? rtnl_getlink+0x113/0x190
[12236.694319]  [] ? rtnetlink_rcv_msg+0x7a/0x210
[12236.694331]  [] ? sock_has_perm+0x5c/0x70
[12236.694339]  [] ? rtnetlink_rcv+0x30/0x30
[12236.694346]  [] ? netlink_rcv_skb+0x9c/0xc0
[12236.694354]  [] ? rtnetlink_rcv+0x1f/0x30
[12236.694360]  [] ? netlink_unicast+0xfb/0x180
[12236.694367]  [] ? netlink_sendmsg+0x484/0x5d0
[12236.694376]  [] ? __wake_up+0x2f/0x50
[12236.694387]  [] ? sock_sendmsg+0x33/0x40
[12236.694396]  [] ? ___sys_sendmsg+0x22e/0x240
[12236.694405]  [] ? ___sys_recvmsg+0x135/0x1a0
[12236.694415]  [] ? eventfd_write+0x82/0x210
[12236.694423]  [] ? fsnotify+0x32e/0x4c0
[12236.694429]  [] ? wake_up_q+0x60/0x60
[12236.694434]  [] ? __sys_sendmsg+0x39/0x70
[12236.694440]  [] ? entry_SYSCALL_64_fastpath+0x12/0x6a

It seems so far plausible that the recursive call into rtnetlink_rcv()
looks suspicious. One way, where this could trigger is that the senders
NETLINK_CB(skb).portid was wrongly 0 (which is rtnetlink socket), so the
rtnl_getlink() request's answer would be sent to the kernel instead to
the actual user process, thus grabbing rtnl_mutex() twice.

One theory how we could end up with a NETLINK_CB(skb).portid of 0 on a
user space process is, when we start out from netlink_sendmsg() with an
unbound portid, so that we need to do netlink_autobind().

Here, we would need to have an error of 0 returned, so that we can
continue with sending the frame and setting NETLINK_CB(skb).portid to 0
eventually. I.e. in netlink_autobind(), we need to return with -EBUSY
from netlink_insert(), so that the error code gets overwritten with 0.

In order to get to this point, the inner __netlink_insert() must return
with -EBUSY so that we reset the socket's portid to 0, and violate the 2nd
rule documented in d470e3b483dc ("[NETLINK]: Fix two socket hashing bugs."),
where it seemed to be a very similar issue that got fixed.

There's one possibility where the rhashtable backend could in-fact return
with -EBUSY. The insert is done via rhashtable_lookup_insert_key(), which
invokes __rhashtable_insert_fast(). From here, we need to trigger the
slow path with rhashtable_insert_rehash(), which can return -EBUSY in
case a rehash of the hashtable is currently already in progress.

This error propagates back to __netlink_insert() and provides us the
needed precondition. Looks like the -EBUSY was introduced first in
ccd57b1bd324 ("rhashtable: Add immediate rehash during insertion"). So,
as -EBUSY must not escape from there, we would need to remap it to a
different error code for user space. As the current rhashtable cannot
take any inserts in that case, it could be mapped to -ENOMEM.

Fixes: ccd57b1bd324 ("rhashtable: Add immediate rehash during insertion")
Reported-by: Linus Torvalds 
Signed-off-by: Daniel Borkmann 
---
 net/netlink/af_netlink.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index d8e2e39..1cfd4af 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -1096,6 +1096,11 @@ static int netlink_insert(struct sock *sk, u32 portid)

err = __netlink_insert(table, sk);
if (err) {
+   /* Currently, a rehashing of rhashtable might be in progress,
+* we however must not allow -EBUSY to

Re: [PATCH 09/10] net: Use VRF device index for socket lookups

2015-08-05 Thread Tom Herbert

On Wed, Aug 5, 2015 at 10:14 AM, David Ahern  wrote:
> The intent of the VRF device is to leverage the existing SO_BINDTODEVICE
> as a means of creating L3 domains. Since sockets are expected to be bound
> to the VRF device the index of the master device needs to be used for
> socket lookups.
>
This patch set seems awfully invasive at the socket layer. Isn't there
anyway this functionality be contained in the routing layer and
sockets use existing API?

Thanks,
Tom

> Signed-off-by: Shrijeet Mukherjee 
> Signed-off-by: David Ahern 
> ---
>  net/ipv4/syncookies.c |  5 -
>  net/ipv4/tcp_input.c  |  6 +-
>  net/ipv4/tcp_ipv4.c   | 11 +--
>  3 files changed, 18 insertions(+), 4 deletions(-)
>
> diff --git a/net/ipv4/syncookies.c b/net/ipv4/syncookies.c
> index d70b1f603692..e5c8b1240278 100644
> --- a/net/ipv4/syncookies.c
> +++ b/net/ipv4/syncookies.c
> @@ -18,6 +18,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>
>  extern int sysctl_tcp_syncookies;
>
> @@ -348,7 +349,9 @@ struct sock *cookie_v4_check(struct sock *sk, struct 
> sk_buff *skb)
> treq->snt_synack= tcp_opt.saw_tstamp ? tcp_opt.rcv_tsecr : 0;
> treq->tfo_listener  = false;
>
> -   ireq->ir_iif = sk->sk_bound_dev_if;
> +   ireq->ir_iif = vrf_master_ifindex_by_index(sock_net(sk), 
> skb->skb_iif);
> +   if (!ireq->ir_iif)
> +   ireq->ir_iif = sk->sk_bound_dev_if;
>
> /* We throwed the options of the initial SYN away, so we hope
>  * the ACK carries the same options again (see RFC1122 4.2.3.8)
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index 4e4d6bcd0ca9..6b96240a4055 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -72,6 +72,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -6141,7 +6142,10 @@ int tcp_conn_request(struct request_sock_ops *rsk_ops,
> tcp_openreq_init(req, &tmp_opt, skb, sk);
>
> /* Note: tcp_v6_init_req() might override ir_iif for link locals */
> -   inet_rsk(req)->ir_iif = sk->sk_bound_dev_if;
> +   inet_rsk(req)->ir_iif = vrf_master_ifindex_by_index(sock_net(sk),
> +   skb->skb_iif);
> +   if (!inet_rsk(req)->ir_iif)
> +   inet_rsk(req)->ir_iif = sk->sk_bound_dev_if;
>
> af_ops->init_req(req, sk, skb);
>
> diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> index d27eb549ced6..0f8ed98a2e64 100644
> --- a/net/ipv4/tcp_ipv4.c
> +++ b/net/ipv4/tcp_ipv4.c
> @@ -75,6 +75,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>
>  #include 
>  #include 
> @@ -682,6 +683,8 @@ static void tcp_v4_send_reset(struct sock *sk, struct 
> sk_buff *skb)
>  */
> if (sk)
> arg.bound_dev_if = sk->sk_bound_dev_if;
> +   if (!arg.bound_dev_if && skb->dev)
> +   arg.bound_dev_if = vrf_master_ifindex_rcu(skb->dev);
>
> arg.tos = ip_hdr(skb)->tos;
> ip_send_unicast_reply(*this_cpu_ptr(net->ipv4.tcp_sk),
> @@ -766,8 +769,10 @@ static void tcp_v4_send_ack(struct sk_buff *skb, u32 
> seq, u32 ack,
>   ip_hdr(skb)->saddr, /* XXX */
>   arg.iov[0].iov_len, IPPROTO_TCP, 0);
> arg.csumoffset = offsetof(struct tcphdr, check) / 2;
> -   if (oif)
> -   arg.bound_dev_if = oif;
> +   arg.bound_dev_if = oif ? : vrf_master_ifindex_rcu(skb_dst(skb)->dev);
> +   if (!arg.bound_dev_if)
> +   arg.bound_dev_if = vrf_master_ifindex_rcu(skb->dev);
> +
> arg.tos = tos;
> ip_send_unicast_reply(*this_cpu_ptr(net->ipv4.tcp_sk),
>   skb, &TCP_SKB_CB(skb)->header.h4.opt,
> @@ -1269,6 +1274,8 @@ struct sock *tcp_v4_syn_recv_sock(struct sock *sk, 
> struct sk_buff *skb,
> ireq  = inet_rsk(req);
> sk_daddr_set(newsk, ireq->ir_rmt_addr);
> sk_rcv_saddr_set(newsk, ireq->ir_loc_addr);
> +   if (netif_index_is_vrf(sock_net(newsk), ireq->ir_iif))
> +   newsk->sk_bound_dev_if = ireq->ir_iif;
> newinet->inet_saddr   = ireq->ir_loc_addr;
> inet_opt  = ireq->opt;
> rcu_assign_pointer(newinet->inet_opt, inet_opt);
> --
> 2.3.2 (Apple Git-55)
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [ovs-dev] [PATCH net-next v3] openvswitch: Make 100 percents packets sampled when sampling rate is 1.

2015-08-05 Thread Pravin Shelar

On Wed, Aug 5, 2015 at 12:30 AM, Wenyu Zhang  wrote:
> When sampling rate is 1, the sampling probability is UINT32_MAX. The packet
> should be sampled even the prandom32() generate the number of UINT32_MAX.
> And none packet need be sampled when the probability is 0.
>
> Signed-off-by: Wenyu Zhang 

Acked-by: Pravin B Shelar 

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [BUG] net/ipv4: inconsistent routing table

2015-08-05 Thread Alexander Duyck


On 08/05/2015 02:06 AM, Daniel Borkmann wrote:

[ please cc netdev ]

On 08/05/2015 10:56 AM, Zang MingJie wrote:

Hi:

I found a bug when remove an ip address which is referenced by a 
routing entry.


step to reproduce:

ip li add type dummy
ip li set dummy0 up
ip ad add 10.0.0.1/24 dev dummy0
ip ad add 10.0.0.2/24 dev dummy0


Okay, so up to this point you have 2 addresses on the same subnet that 
are now on dummy0.



ip ro add default via 10.0.0.2/24


This makes the default route go through 10.0.0.2.


ip ad del 10.0.0.2/24 dev dummy0


Then you remove 10.0.0.2 from the local system, however since 10.0.0.1 
is on the same subnet dummy0 would still be the correct interface to 
access 10.0.0.2 it is just no longer local to the system.



after deleting the secondary ip address, the routing entry still
pointing to 10.0.0.2


You didn't delete the default routing entry so why would you expect it 
to change?  All you did is remove 10.0.0.2 from the local system.  I 
believe the assumption is that 10.0.0.2 is still out there somewhere, it 
just isn't on the local system anymore.



# ip ro
default via 10.0.0.2 dev dummy0
10.0.0.0/24 dev dummy0  proto kernel  scope link  src 10.0.0.1


This matches up with what I would expect.  10.0.0.2 is the default 
gateway and it is accessible from dummy0 since 10.0.0.0/24 is accessible 
from dummy0.



but actually, kernel considers the default route is directly connected.

# ip ro get 1.1.1.1
1.1.1.1 dev dummy0  src 10.0.0.1
 cache


I'm not sure how you came to the "directly connected" conclusion. It is 
still routing things out through 10.0.0.2 from 10.0.0.1.


Maybe your example would work better if you used 10.0.0.1 and 10.0.1.1 
instead.  Then I think you might be able to better see that when you 
delete the second address the route would be broken.


- Alex
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] sky2: Add module parameter for passing the MAC address

2015-08-05 Thread Liviu Dudau

On Wed, Aug 05, 2015 at 06:15:37PM +0100, Ryan Harkin wrote:
>On 5 August 2015 at 16:50, Liviu Dudau <[1]liviu.du...@arm.com> wrote:
> 
>  For designs where EEPROMs are not connected to PCI Yukon2
>  chips we need to get the MAC address from the firmware.
>  Add a module parameter called 'mac_address' for this. It
>  will be used if no DT node can be found and the B2_MAC
>  register holds an invalid value.
> 
>  Signed-off-by: Liviu Dudau <[2]liviu.du...@arm.com>
> 
>Looks good to me.  FWIW, you can have a tested or reviewed-by at your 
> preference:
>Tested-by: Ryan Harkin <[3]ryan.har...@linaro.org>
>Reviewed-by: Ryan Harkin <[4]ryan.har...@linaro.org>
> 

Thanks Ryan, I think one can provide both tags, so I will use them together.

Best regards,
Liviu

> 
> 
>  ---
>   drivers/net/ethernet/marvell/sky2.c | 14 +-
>   1 file changed, 13 insertions(+), 1 deletion(-)
> 
>  diff --git a/drivers/net/ethernet/marvell/sky2.c 
> b/drivers/net/ethernet/marvell/sky2.c
>  index d9f4498..a977d95 100644
>  --- a/drivers/net/ethernet/marvell/sky2.c
>  +++ b/drivers/net/ethernet/marvell/sky2.c
>  @@ -101,6 +101,10 @@ static int legacy_pme = 0;
>   module_param(legacy_pme, int, 0);
>   MODULE_PARM_DESC(legacy_pme, "Legacy power management");
> 
>  +/* Ugh!  Let the firmware tell us the hardware address */
>  +static int mac_address[ETH_ALEN] = { 0, };
>  +module_param_array(mac_address, int, NULL, 0);
>  +
>   static const struct pci_device_id sky2_id_table[] = {
>          { PCI_DEVICE(PCI_VENDOR_ID_SYSKONNECT, 0x9000) }, /* SK-9Sxx */
>          { PCI_DEVICE(PCI_VENDOR_ID_SYSKONNECT, 0x9E00) }, /* SK-9Exx */
>  @@ -4811,13 +4815,21 @@ static struct net_device 
> *sky2_init_netdev(struct sky2_hw *hw, unsigned port,
>          /* try to get mac address in the following order:
>           * 1) from device tree data
>           * 2) from internal registers set by bootloader
>  +        * 3) from the command line parameter
>           */
>          iap = of_get_mac_address(hw->pdev->dev.of_node);
>          if (iap)
>                  memcpy(dev->dev_addr, iap, ETH_ALEN);
>  -       else
>  +       else {
>                  memcpy_fromio(dev->dev_addr, hw->regs + B2_MAC_1 + port 
> * 8,
>                                ETH_ALEN);
>  +               if (!is_valid_ether_addr(&dev->dev_addr[0])) {
>  +                       int i;
>  +
>  +                       for (i = 0; i < ETH_ALEN; i++)
>  +                               dev->dev_addr[i] = mac_address[i];
>  +               }
>  +       }
> 
>          return dev;
>   }
>  --
>  2.4.6
> 
> References
> 
>Visible links
>1. mailto:liviu.du...@arm.com
>2. mailto:liviu.du...@arm.com
>3. mailto:ryan.har...@linaro.org
>4. mailto:ryan.har...@linaro.org

-- 

| I would like to |
| fix the world,  |
| but they're not |
| giving me the   |
 \ source code!  /
  ---
¯\_(ツ)_/¯

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] sky2: Add module parameter for passing the MAC address

2015-08-05 Thread Liviu Dudau

On Wed, Aug 05, 2015 at 05:40:57PM +0100, Stephen Hemminger wrote:
> On Wed,  5 Aug 2015 16:50:54 +0100
> Liviu Dudau  wrote:
> 
> > For designs where EEPROMs are not connected to PCI Yukon2
> > chips we need to get the MAC address from the firmware.
> > Add a module parameter called 'mac_address' for this. It
> > will be used if no DT node can be found and the B2_MAC
> > register holds an invalid value.
> > 
> > Signed-off-by: Liviu Dudau 
> 
> Yes, I can see that this can be a real problem, and other drivers
> solve the problem. The standard method is to assign a random mac address
> (and then let scripts overwrite) rather than introducing module parameter.
> Module parameters are discouraged because they are device specific.
> 

I agree. However, in my case, the boards people have assigned MAC addresses
to the chip, they just didn't built the board in such a way as to allow one
to store that MAC address in a permanent way :( And no, I can't use the DT
because the chip is actually on the PCIe bus.

Even with the generation of a random address, it still needs to be copied
into the device, so I would guess that a version of the patch I've sent is
still relevant?

Best regards,
Liviu

>  
> 

-- 

| I would like to |
| fix the world,  |
| but they're not |
| giving me the   |
 \ source code!  /
  ---
¯\_(ツ)_/¯

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 06/10] net: Fix up inet_addr_type checks

2015-08-05 Thread David Ahern

Currently inet_addr_type and inet_dev_addr_type expect local addresses
to be in the local table. With the VRF device local routes for devices
associated with a VRF will be in the table associated with the VRF.
Provide an alternate inet_addr lookup to use a specific table rather
than defaulting to the local table.

inet_addr_type_dev_table keeps the same semantics as inet_addr_type but
if the passed in device is enslaved to a VRF then the table for that VRF
is used for the lookup.

Signed-off-by: David Ahern 
---
 include/net/route.h  |  3 +++
 net/ipv4/af_inet.c   | 13 -
 net/ipv4/arp.c   | 15 +--
 net/ipv4/fib_frontend.c  | 28 +---
 net/ipv4/fib_semantics.c |  6 --
 net/ipv4/icmp.c  |  5 +++--
 6 files changed, 56 insertions(+), 14 deletions(-)

diff --git a/include/net/route.h b/include/net/route.h
index 6ba681f0b98d..6dda2c1bf8c6 100644
--- a/include/net/route.h
+++ b/include/net/route.h
@@ -192,6 +192,9 @@ unsigned int inet_addr_type(struct net *net, __be32 addr);
 unsigned int inet_addr_type_table(struct net *net, __be32 addr, int tb_id);
 unsigned int inet_dev_addr_type(struct net *net, const struct net_device *dev,
__be32 addr);
+unsigned int inet_addr_type_dev_table(struct net *net,
+ const struct net_device *dev,
+ __be32 addr);
 void ip_rt_multicast_event(struct in_device *);
 int ip_rt_ioctl(struct net *, unsigned int cmd, void __user *arg);
 void ip_rt_get_source(u8 *src, struct sk_buff *skb, struct rtable *rt);
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index cc4e498a0ccf..96fba4f63454 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -119,6 +119,7 @@
 #ifdef CONFIG_IP_MROUTE
 #include 
 #endif
+#include 
 
 
 /* The inetsw table contains everything that inet_create needs to
@@ -427,6 +428,7 @@ int inet_bind(struct socket *sock, struct sockaddr *uaddr, 
int addr_len)
struct net *net = sock_net(sk);
unsigned short snum;
int chk_addr_ret;
+   int tb_id = 0;
int err;
 
/* If the socket has its own bind function then use it. (RAW) */
@@ -448,7 +450,16 @@ int inet_bind(struct socket *sock, struct sockaddr *uaddr, 
int addr_len)
goto out;
}
 
-   chk_addr_ret = inet_addr_type(net, addr->sin_addr.s_addr);
+   if (sk->sk_bound_dev_if) {
+   struct net_device *dev;
+
+   rcu_read_lock();
+   dev = dev_get_by_index_rcu(net, sk->sk_bound_dev_if);
+   if (dev)
+   tb_id = vrf_dev_table_rcu(dev);
+   rcu_read_unlock();
+   }
+   chk_addr_ret = inet_addr_type_table(net, addr->sin_addr.s_addr, tb_id);
 
/* Not specified by any standard per-se, however it breaks too
 * many applications when removed.  It is unfortunate since
diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c
index 34a308573f4b..30409b75e925 100644
--- a/net/ipv4/arp.c
+++ b/net/ipv4/arp.c
@@ -233,7 +233,7 @@ static int arp_constructor(struct neighbour *neigh)
return -EINVAL;
}
 
-   neigh->type = inet_addr_type(dev_net(dev), addr);
+   neigh->type = inet_addr_type_dev_table(dev_net(dev), dev, addr);
 
parms = in_dev->arp_parms;
__neigh_parms_put(neigh->parms);
@@ -343,7 +343,7 @@ static void arp_solicit(struct neighbour *neigh, struct 
sk_buff *skb)
switch (IN_DEV_ARP_ANNOUNCE(in_dev)) {
default:
case 0: /* By default announce any local IP */
-   if (skb && inet_addr_type(dev_net(dev),
+   if (skb && inet_addr_type_dev_table(dev_net(dev), dev,
  ip_hdr(skb)->saddr) == RTN_LOCAL)
saddr = ip_hdr(skb)->saddr;
break;
@@ -351,7 +351,8 @@ static void arp_solicit(struct neighbour *neigh, struct 
sk_buff *skb)
if (!skb)
break;
saddr = ip_hdr(skb)->saddr;
-   if (inet_addr_type(dev_net(dev), saddr) == RTN_LOCAL) {
+   if (inet_addr_type_dev_table(dev_net(dev), dev,
+saddr) == RTN_LOCAL) {
/* saddr should be known to target */
if (inet_addr_onlink(in_dev, target, saddr))
break;
@@ -751,7 +752,7 @@ static int arp_process(struct sock *sk, struct sk_buff *skb)
/* Special case: IPv4 duplicate address detection packet (RFC2131) */
if (sip == 0) {
if (arp->ar_op == htons(ARPOP_REQUEST) &&
-   inet_addr_type(net, tip) == RTN_LOCAL &&
+   inet_addr_type_dev_table(net, dev, tip) == RTN_LOCAL &&
!arp_ignore(in_dev, sip, tip))
arp_send(ARPOP_REPLY, ETH_P_ARP, sip, dev, tip, sha,

[PATCH 07/10] net: Add routes to the table associated with the device

2015-08-05 Thread David Ahern

When a device associated with a VRF is brought up or down routes
should be added to/removed from the table associated with the VRF.
fib_magic defaults to using the main or local tables. Have it use
the table with the device if there is one.

A part of this is directing prefsrc validations to the correct
table as well.

Signed-off-by: David Ahern 
---
 net/ipv4/fib_frontend.c  |  8 
 net/ipv4/fib_semantics.c | 25 +++--
 2 files changed, 23 insertions(+), 10 deletions(-)

diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index d84ae0e30369..0a50a08ab844 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -803,6 +803,7 @@ static int inet_dump_fib(struct sk_buff *skb, struct 
netlink_callback *cb)
 static void fib_magic(int cmd, int type, __be32 dst, int dst_len, struct 
in_ifaddr *ifa)
 {
struct net *net = dev_net(ifa->ifa_dev->dev);
+   int tb_id = vrf_dev_table_rtnl(ifa->ifa_dev->dev);
struct fib_table *tb;
struct fib_config cfg = {
.fc_protocol = RTPROT_KERNEL,
@@ -817,11 +818,10 @@ static void fib_magic(int cmd, int type, __be32 dst, int 
dst_len, struct in_ifad
},
};
 
-   if (type == RTN_UNICAST)
-   tb = fib_new_table(net, RT_TABLE_MAIN);
-   else
-   tb = fib_new_table(net, RT_TABLE_LOCAL);
+   if (!tb_id)
+   tb_id = (type == RTN_UNICAST) ? RT_TABLE_MAIN : RT_TABLE_LOCAL;
 
+   tb = fib_new_table(net, tb_id);
if (!tb)
return;
 
diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
index 410ddb67221e..85e9a8abf15c 100644
--- a/net/ipv4/fib_semantics.c
+++ b/net/ipv4/fib_semantics.c
@@ -838,6 +838,23 @@ __be32 fib_info_update_nh_saddr(struct net *net, struct 
fib_nh *nh)
return nh->nh_saddr;
 }
 
+static bool fib_valid_prefsrc(struct fib_config *cfg, __be32 fib_prefsrc)
+{
+   if (cfg->fc_type != RTN_LOCAL || !cfg->fc_dst ||
+   fib_prefsrc != cfg->fc_dst) {
+   int tb_id = cfg->fc_table;
+
+   if (tb_id == RT_TABLE_MAIN)
+   tb_id = RT_TABLE_LOCAL;
+
+   if (inet_addr_type_table(cfg->fc_nlinfo.nl_net,
+fib_prefsrc, tb_id) != RTN_LOCAL) {
+   return false;
+   }
+   }
+   return true;
+}
+
 struct fib_info *fib_create_info(struct fib_config *cfg)
 {
int err;
@@ -1033,12 +1050,8 @@ struct fib_info *fib_create_info(struct fib_config *cfg)
fi->fib_flags |= RTNH_F_LINKDOWN;
}
 
-   if (fi->fib_prefsrc) {
-   if (cfg->fc_type != RTN_LOCAL || !cfg->fc_dst ||
-   fi->fib_prefsrc != cfg->fc_dst)
-   if (inet_addr_type(net, fi->fib_prefsrc) != RTN_LOCAL)
-   goto err_inval;
-   }
+   if (fi->fib_prefsrc && !fib_valid_prefsrc(cfg, fi->fib_prefsrc))
+   goto err_inval;
 
change_nexthops(fi) {
fib_info_update_nh_saddr(net, nexthop_nh);
-- 
2.3.2 (Apple Git-55)

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 04/10] udp: Handle VRF device

2015-08-05 Thread David Ahern

For unconnected UDP sockets using a VRF device lookup source address
based on VRF table. This allows the UDP header to be properly setup
before showing up at the VRF device via the dst.

Signed-off-by: Shrijeet Mukherjee 
Signed-off-by: David Ahern 
---
 net/ipv4/udp.c | 25 +++--
 1 file changed, 23 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 83aa604f9273..b513d72a21b3 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -884,7 +884,7 @@ int udp_sendmsg(struct sock *sk, struct msghdr *msg, size_t 
len)
struct rtable *rt = NULL;
int free = 0;
int connected = 0;
-   __be32 daddr, faddr, saddr;
+   __be32 daddr, faddr, saddr, vsaddr = 0;
__be16 dport;
u8  tos;
int err, is_udplite = IS_UDPLITE(sk);
@@ -1013,11 +1013,30 @@ int udp_sendmsg(struct sock *sk, struct msghdr *msg, 
size_t len)
 
if (!rt) {
struct net *net = sock_net(sk);
+   __u8 flow_flags = inet_sk_flowi_flags(sk);
 
fl4 = &fl4_stack;
+
+   /* unconnected socket. If output device is enslaved to a VRF
+* device lookup source address from VRF table. This mimics
+* behavior of ip_route_connect{_init}.
+*/
+   if (netif_index_is_vrf(net, ipc.oif)) {
+   flowi4_init_output(fl4, ipc.oif, sk->sk_mark, tos,
+  RT_SCOPE_UNIVERSE, sk->sk_protocol,
+  (flow_flags | FLOWI_FLAG_VRFSRC),
+  faddr, saddr, dport, 
inet->inet_sport);
+
+   rt = ip_route_output_flow(net, fl4, sk);
+   if (!IS_ERR(rt)) {
+   vsaddr = fl4->saddr;
+   ip_rt_put(rt);
+   }
+   }
+
flowi4_init_output(fl4, ipc.oif, sk->sk_mark, tos,
   RT_SCOPE_UNIVERSE, sk->sk_protocol,
-  inet_sk_flowi_flags(sk),
+  flow_flags,
   faddr, saddr, dport, inet->inet_sport);
 
security_sk_classify_flow(sk, flowi4_to_flowi(fl4));
@@ -1042,6 +1061,8 @@ int udp_sendmsg(struct sock *sk, struct msghdr *msg, 
size_t len)
goto do_confirm;
 back_from_confirm:
 
+   if (vsaddr)
+   fl4->saddr = vsaddr;
saddr = fl4->saddr;
if (!ipc.addr)
daddr = ipc.addr = fl4->daddr;
-- 
2.3.2 (Apple Git-55)

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 09/10] net: Use VRF device index for socket lookups

2015-08-05 Thread David Ahern

The intent of the VRF device is to leverage the existing SO_BINDTODEVICE
as a means of creating L3 domains. Since sockets are expected to be bound
to the VRF device the index of the master device needs to be used for
socket lookups.

Signed-off-by: Shrijeet Mukherjee 
Signed-off-by: David Ahern 
---
 net/ipv4/syncookies.c |  5 -
 net/ipv4/tcp_input.c  |  6 +-
 net/ipv4/tcp_ipv4.c   | 11 +--
 3 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/net/ipv4/syncookies.c b/net/ipv4/syncookies.c
index d70b1f603692..e5c8b1240278 100644
--- a/net/ipv4/syncookies.c
+++ b/net/ipv4/syncookies.c
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 
 extern int sysctl_tcp_syncookies;
 
@@ -348,7 +349,9 @@ struct sock *cookie_v4_check(struct sock *sk, struct 
sk_buff *skb)
treq->snt_synack= tcp_opt.saw_tstamp ? tcp_opt.rcv_tsecr : 0;
treq->tfo_listener  = false;
 
-   ireq->ir_iif = sk->sk_bound_dev_if;
+   ireq->ir_iif = vrf_master_ifindex_by_index(sock_net(sk), skb->skb_iif);
+   if (!ireq->ir_iif)
+   ireq->ir_iif = sk->sk_bound_dev_if;
 
/* We throwed the options of the initial SYN away, so we hope
 * the ACK carries the same options again (see RFC1122 4.2.3.8)
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 4e4d6bcd0ca9..6b96240a4055 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -72,6 +72,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -6141,7 +6142,10 @@ int tcp_conn_request(struct request_sock_ops *rsk_ops,
tcp_openreq_init(req, &tmp_opt, skb, sk);
 
/* Note: tcp_v6_init_req() might override ir_iif for link locals */
-   inet_rsk(req)->ir_iif = sk->sk_bound_dev_if;
+   inet_rsk(req)->ir_iif = vrf_master_ifindex_by_index(sock_net(sk),
+   skb->skb_iif);
+   if (!inet_rsk(req)->ir_iif)
+   inet_rsk(req)->ir_iif = sk->sk_bound_dev_if;
 
af_ops->init_req(req, sk, skb);
 
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index d27eb549ced6..0f8ed98a2e64 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -75,6 +75,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -682,6 +683,8 @@ static void tcp_v4_send_reset(struct sock *sk, struct 
sk_buff *skb)
 */
if (sk)
arg.bound_dev_if = sk->sk_bound_dev_if;
+   if (!arg.bound_dev_if && skb->dev)
+   arg.bound_dev_if = vrf_master_ifindex_rcu(skb->dev);
 
arg.tos = ip_hdr(skb)->tos;
ip_send_unicast_reply(*this_cpu_ptr(net->ipv4.tcp_sk),
@@ -766,8 +769,10 @@ static void tcp_v4_send_ack(struct sk_buff *skb, u32 seq, 
u32 ack,
  ip_hdr(skb)->saddr, /* XXX */
  arg.iov[0].iov_len, IPPROTO_TCP, 0);
arg.csumoffset = offsetof(struct tcphdr, check) / 2;
-   if (oif)
-   arg.bound_dev_if = oif;
+   arg.bound_dev_if = oif ? : vrf_master_ifindex_rcu(skb_dst(skb)->dev);
+   if (!arg.bound_dev_if)
+   arg.bound_dev_if = vrf_master_ifindex_rcu(skb->dev);
+
arg.tos = tos;
ip_send_unicast_reply(*this_cpu_ptr(net->ipv4.tcp_sk),
  skb, &TCP_SKB_CB(skb)->header.h4.opt,
@@ -1269,6 +1274,8 @@ struct sock *tcp_v4_syn_recv_sock(struct sock *sk, struct 
sk_buff *skb,
ireq  = inet_rsk(req);
sk_daddr_set(newsk, ireq->ir_rmt_addr);
sk_rcv_saddr_set(newsk, ireq->ir_loc_addr);
+   if (netif_index_is_vrf(sock_net(newsk), ireq->ir_iif))
+   newsk->sk_bound_dev_if = ireq->ir_iif;
newinet->inet_saddr   = ireq->ir_loc_addr;
inet_opt  = ireq->opt;
rcu_assign_pointer(newinet->inet_opt, inet_opt);
-- 
2.3.2 (Apple Git-55)

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 10/10] net: Introduce VRF device driver

2015-08-05 Thread David Ahern

This driver borrows heavily from IPvlan and teaming drivers.

Routing domains (VRF-lite) are created by instantiating a VRF master
device with an associated table and enslaving all routed interfaces that
participate in the domain. As part of the enslavement, all connected
routes for the enslaved devices are moved to the table associated with
the VRF device. Outgoing sockets must bind to the VRF device to function.

Standard FIB rules bind the VRF device to tables and regular fib rule
processing is followed. Routed traffic through the box, is forwarded by
using the VRF device as the IIF and following the IIF rule to a table
that is mated with the VRF.

Example:

   Create vrf 1:
 ip link add vrf1 type vrf table 5
 ip rule add iif vrf1 table 5
 ip rule add oif vrf1 table 5
 ip route add table 5 prohibit default
 ip link set vrf1 up

   Add interface to vrf 1:
 ip link set eth1 master vrf1

Signed-off-by: Shrijeet Mukherjee 
Signed-off-by: David Ahern 
---
 drivers/net/Kconfig  |   7 +
 drivers/net/Makefile |   1 +
 drivers/net/vrf.c| 715 +++
 include/net/vrf.h|   1 -
 4 files changed, 723 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/vrf.c

diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index c18f9e62a9fa..e58468b02987 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -297,6 +297,13 @@ config NLMON
  diagnostics, etc. This is mostly intended for developers or support
  to debug netlink issues. If unsure, say N.
 
+config NET_VRF
+   tristate "Virtual Routing and Forwarding (Lite)"
+   depends on IP_MULTIPLE_TABLES && IPV6_MULTIPLE_TABLES
+   ---help---
+ This option enables the support for mapping interfaces into VRF's. The
+ support enables VRF devices.
+
 endif # NET_CORE
 
 config SUNGEM_PHY
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index c12cb22478a7..ca16dd689b36 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -25,6 +25,7 @@ obj-$(CONFIG_VIRTIO_NET) += virtio_net.o
 obj-$(CONFIG_VXLAN) += vxlan.o
 obj-$(CONFIG_GENEVE) += geneve.o
 obj-$(CONFIG_NLMON) += nlmon.o
+obj-$(CONFIG_NET_VRF) += vrf.o
 
 #
 # Networking Drivers
diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c
new file mode 100644
index ..75c06ee2efa3
--- /dev/null
+++ b/drivers/net/vrf.c
@@ -0,0 +1,715 @@
+/*
+ * vrf.c: device driver to encapsulate a VRF space
+ *
+ * Copyright (c) 2015 Cumulus Networks. All rights reserved.
+ * Copyright (c) 2015 Shrijeet Mukherjee 
+ * Copyright (c) 2015 David Ahern 
+ *
+ * Based on dummy, team and ipvlan drivers
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define DRV_NAME   "vrf"
+#define DRV_VERSION"1.0"
+
+#define vrf_is_slave(dev)   ((dev)->flags & IFF_SLAVE)
+
+#define vrf_master_get_rcu(dev) \
+   ((struct net_device *)rcu_dereference(dev->rx_handler_data))
+
+struct pcpu_dstats {
+   u64 tx_pkts;
+   u64 tx_bytes;
+   u64 tx_drps;
+   u64 rx_pkts;
+   u64 rx_bytes;
+   struct u64_stats_sync   syncp;
+};
+
+static struct dst_entry *vrf_ip_check(struct dst_entry *dst, u32 cookie)
+{
+   return dst;
+}
+
+static int vrf_ip_local_out(struct sk_buff *skb)
+{
+   return ip_local_out(skb);
+}
+
+static unsigned int vrf_v4_mtu(const struct dst_entry *dst)
+{
+   /* TO-DO: return max ethernet size? */
+   return dst->dev->mtu;
+}
+
+static void vrf_dst_destroy(struct dst_entry *dst)
+{
+   /* our dst lives forever - or until the device is closed */
+}
+
+static unsigned int vrf_default_advmss(const struct dst_entry *dst)
+{
+   return 65535 - 40;
+}
+
+static struct dst_ops vrf_dst_ops = {
+   .family = AF_INET,
+   .local_out  = vrf_ip_local_out,
+   .check  = vrf_ip_check,
+   .mtu= vrf_v4_mtu,
+   .destroy= vrf_dst_destroy,
+   .default_advmss = vrf_default_advmss,
+};
+
+static bool is_ip_rx_frame(struct sk_buff *skb)
+{
+   switch (skb->protocol) {
+   case htons(ETH_P_IP):
+   case htons(ETH_P_IPV6):
+   return true;
+   }
+   return false;
+}
+
+/* note: already called with rcu_read_lock */
+static rx_handler_result_t vrf_handle_frame(struct sk_buff **pskb)
+{
+   struct sk_buff *skb = *pskb;
+
+   if (is_ip_rx_frame(skb)) {
+   struct net_device *dev = vrf_master_get_rcu(skb->dev);
+

[PATCH 02/10] net: Use VRF device index for lookups on RX

2015-08-05 Thread David Ahern

On ingress use index of VRF master device for route lookups if real device
is enslaved. Rules are expected to be installed for the VRF device to
direct lookups to a specific table.

Signed-off-by: Shrijeet Mukherjee 
Signed-off-by: David Ahern 
---
 net/ipv4/fib_frontend.c | 8 +++-
 net/ipv4/route.c| 3 ++-
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index 6b98de0d7949..d8ced1d89f1b 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -45,6 +45,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #ifndef CONFIG_IP_MULTIPLE_TABLES
 
@@ -309,7 +310,9 @@ static int __fib_validate_source(struct sk_buff *skb, 
__be32 src, __be32 dst,
bool dev_match;
 
fl4.flowi4_oif = 0;
-   fl4.flowi4_iif = oif ? : LOOPBACK_IFINDEX;
+   fl4.flowi4_iif = vrf_master_ifindex_rcu(dev);
+   if (!fl4.flowi4_iif)
+   fl4.flowi4_iif = oif ? : LOOPBACK_IFINDEX;
fl4.daddr = src;
fl4.saddr = dst;
fl4.flowi4_tos = tos;
@@ -339,6 +342,9 @@ static int __fib_validate_source(struct sk_buff *skb, 
__be32 src, __be32 dst,
if (nh->nh_dev == dev) {
dev_match = true;
break;
+   } else if (vrf_master_ifindex_rcu(nh->nh_dev) == dev->ifindex) {
+   dev_match = true;
+   break;
}
}
 #else
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 18fd7c9095c7..c26ff1f7067d 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -112,6 +112,7 @@
 #endif
 #include 
 #include 
+#include 
 
 #define RT_FL_TOS(oldflp4) \
((oldflp4)->flowi4_tos & (IPTOS_RT_MASK | RTO_ONLINK))
@@ -1726,7 +1727,7 @@ static int ip_route_input_slow(struct sk_buff *skb, 
__be32 daddr, __be32 saddr,
 *  Now we are ready to route packet.
 */
fl4.flowi4_oif = 0;
-   fl4.flowi4_iif = dev->ifindex;
+   fl4.flowi4_iif = vrf_master_ifindex_rcu(dev) ? : dev->ifindex;
fl4.flowi4_mark = skb->mark;
fl4.flowi4_tos = tos;
fl4.flowi4_scope = RT_SCOPE_UNIVERSE;
-- 
2.3.2 (Apple Git-55)

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 05/10] net: Add inet_addr lookup by table

2015-08-05 Thread David Ahern

Currently inet_addr_type and inet_dev_addr_type expect local addresses
to be in the local table. With the VRF device local routes for devices
associated with a VRF will be in the table associated with the VRF.
Provide an alternate inet_addr lookup to use a specific table rather
than defaulting to the local table.

Signed-off-by: Shrijeet Mukherjee 
Signed-off-by: David Ahern 
---
 include/net/route.h |  1 +
 net/ipv4/fib_frontend.c | 22 +++---
 2 files changed, 16 insertions(+), 7 deletions(-)

diff --git a/include/net/route.h b/include/net/route.h
index 94189d4bd899..6ba681f0b98d 100644
--- a/include/net/route.h
+++ b/include/net/route.h
@@ -189,6 +189,7 @@ void ipv4_sk_redirect(struct sk_buff *skb, struct sock *sk);
 void ip_rt_send_redirect(struct sk_buff *skb);
 
 unsigned int inet_addr_type(struct net *net, __be32 addr);
+unsigned int inet_addr_type_table(struct net *net, __be32 addr, int tb_id);
 unsigned int inet_dev_addr_type(struct net *net, const struct net_device *dev,
__be32 addr);
 void ip_rt_multicast_event(struct in_device *);
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index d8ced1d89f1b..b11321a8e58d 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -212,12 +212,12 @@ void fib_flush_external(struct net *net)
  */
 static inline unsigned int __inet_dev_addr_type(struct net *net,
const struct net_device *dev,
-   __be32 addr)
+   __be32 addr, int tb_id)
 {
struct flowi4   fl4 = { .daddr = addr };
struct fib_result   res;
unsigned int ret = RTN_BROADCAST;
-   struct fib_table *local_table;
+   struct fib_table *table;
 
if (ipv4_is_zeronet(addr) || ipv4_is_lbcast(addr))
return RTN_BROADCAST;
@@ -226,10 +226,10 @@ static inline unsigned int __inet_dev_addr_type(struct 
net *net,
 
rcu_read_lock();
 
-   local_table = fib_get_table(net, RT_TABLE_LOCAL);
-   if (local_table) {
+   table = fib_get_table(net, tb_id);
+   if (table) {
ret = RTN_UNICAST;
-   if (!fib_table_lookup(local_table, &fl4, &res, 
FIB_LOOKUP_NOREF)) {
+   if (!fib_table_lookup(table, &fl4, &res, FIB_LOOKUP_NOREF)) {
if (!dev || dev == res.fi->fib_dev)
ret = res.type;
}
@@ -239,16 +239,24 @@ static inline unsigned int __inet_dev_addr_type(struct 
net *net,
return ret;
 }
 
+unsigned int inet_addr_type_table(struct net *net, __be32 addr, int tb_id)
+{
+   return __inet_dev_addr_type(net, NULL, addr, tb_id);
+}
+EXPORT_SYMBOL(inet_addr_type_table);
+
 unsigned int inet_addr_type(struct net *net, __be32 addr)
 {
-   return __inet_dev_addr_type(net, NULL, addr);
+   return __inet_dev_addr_type(net, NULL, addr, RT_TABLE_LOCAL);
 }
 EXPORT_SYMBOL(inet_addr_type);
 
 unsigned int inet_dev_addr_type(struct net *net, const struct net_device *dev,
__be32 addr)
 {
-   return __inet_dev_addr_type(net, dev, addr);
+   int rt_table = vrf_dev_table(dev) ? : RT_TABLE_LOCAL;
+
+   return __inet_dev_addr_type(net, dev, addr, rt_table);
 }
 EXPORT_SYMBOL(inet_dev_addr_type);
 
-- 
2.3.2 (Apple Git-55)

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 03/10] net: Use VRF device index for lookups on TX

2015-08-05 Thread David Ahern

As with ingress use the index of VRF master device for route lookups on
egress. However, the oif should only be used to direct the lookups to a
specific table. Routes in the table are not based on the VRF device but
rather interfaces that are part of the VRF so do not consider the oif for
lookups within the table. The FLOWI_FLAG_VRFSRC is used to control this
latter part.

Signed-off-by: Shrijeet Mukherjee 
Signed-off-by: David Ahern 
---
 include/net/flow.h  | 1 +
 include/net/route.h | 3 +++
 net/ipv4/fib_trie.c | 7 +--
 net/ipv4/icmp.c | 4 
 net/ipv4/route.c| 5 +
 5 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/include/net/flow.h b/include/net/flow.h
index 3098ae33a178..f305588fc162 100644
--- a/include/net/flow.h
+++ b/include/net/flow.h
@@ -33,6 +33,7 @@ struct flowi_common {
__u8flowic_flags;
 #define FLOWI_FLAG_ANYSRC  0x01
 #define FLOWI_FLAG_KNOWN_NH0x02
+#define FLOWI_FLAG_VRFSRC  0x04
__u32   flowic_secid;
struct flowi_tunnel flowic_tun_key;
 };
diff --git a/include/net/route.h b/include/net/route.h
index 2d45f419477f..94189d4bd899 100644
--- a/include/net/route.h
+++ b/include/net/route.h
@@ -251,6 +251,9 @@ static inline void ip_route_connect_init(struct flowi4 
*fl4, __be32 dst, __be32
if (inet_sk(sk)->transparent)
flow_flags |= FLOWI_FLAG_ANYSRC;
 
+   if (netif_index_is_vrf(sock_net(sk), oif))
+   flow_flags |= FLOWI_FLAG_VRFSRC;
+
flowi4_init_output(fl4, oif, sk->sk_mark, tos, RT_SCOPE_UNIVERSE,
   protocol, flow_flags, dst, src, dport, sport);
 }
diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
index 37c4bb89a708..1243c79cb5b0 100644
--- a/net/ipv4/fib_trie.c
+++ b/net/ipv4/fib_trie.c
@@ -1423,8 +1423,11 @@ int fib_table_lookup(struct fib_table *tb, const struct 
flowi4 *flp,
nh->nh_flags & RTNH_F_LINKDOWN &&
!(fib_flags & FIB_LOOKUP_IGNORE_LINKSTATE))
continue;
-   if (flp->flowi4_oif && flp->flowi4_oif != nh->nh_oif)
-   continue;
+   if (!(flp->flowi4_flags & FLOWI_FLAG_VRFSRC)) {
+   if (flp->flowi4_oif &&
+   flp->flowi4_oif != nh->nh_oif)
+   continue;
+   }
 
if (!(fib_flags & FIB_LOOKUP_NOREF))
atomic_inc(&fi->fib_clntref);
diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index c0556f1e4bf0..1164fc4ce3bc 100644
--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -96,6 +96,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /*
  * Build xmit assembly blocks
@@ -425,6 +426,7 @@ static void icmp_reply(struct icmp_bxm *icmp_param, struct 
sk_buff *skb)
fl4.flowi4_mark = mark;
fl4.flowi4_tos = RT_TOS(ip_hdr(skb)->tos);
fl4.flowi4_proto = IPPROTO_ICMP;
+   fl4.flowi4_oif = vrf_master_ifindex_rcu(skb->dev) ? : skb->dev->ifindex;
security_skb_classify_flow(skb, flowi4_to_flowi(&fl4));
rt = ip_route_output_key(net, &fl4);
if (IS_ERR(rt))
@@ -458,6 +460,8 @@ static struct rtable *icmp_route_lookup(struct net *net,
fl4->flowi4_proto = IPPROTO_ICMP;
fl4->fl4_icmp_type = type;
fl4->fl4_icmp_code = code;
+   fl4->flowi4_oif = vrf_master_ifindex_rcu(skb_in->dev) ? : 
skb_in->dev->ifindex;
+
security_skb_classify_flow(skb_in, flowi4_to_flowi(fl4));
rt = __ip_route_output_key(net, fl4);
if (IS_ERR(rt))
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index c26ff1f7067d..2c89d294b669 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -2131,6 +2131,11 @@ struct rtable *__ip_route_output_key(struct net *net, 
struct flowi4 *fl4)
fl4->saddr = inet_select_addr(dev_out, 0,
  RT_SCOPE_HOST);
}
+   if (netif_is_vrf(dev_out) &&
+   !(fl4->flowi4_flags & FLOWI_FLAG_VRFSRC)) {
+   rth = vrf_dev_get_rth(dev_out);
+   goto out;
+   }
}
 
if (!fl4->daddr) {
-- 
2.3.2 (Apple Git-55)

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 01/10] net: Introduce VRF related flags and helpers

2015-08-05 Thread David Ahern

Add a VRF_MASTER flag for interfaces and helper functions for determining
if a device is a VRF_MASTER.

Add link attribute for passing VRF_TABLE id.

Add vrf_ptr to netdevice.

Add various macros for determining if a device is a VRF device, the index
of the master VRF device and table associated with VRF device.

Signed-off-by: Shrijeet Mukherjee 
Signed-off-by: David Ahern 
---
 include/linux/netdevice.h|  20 +
 include/net/vrf.h| 177 +++
 include/uapi/linux/if_link.h |   9 +++
 3 files changed, 206 insertions(+)
 create mode 100644 include/net/vrf.h

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 607b5f41f46f..f7a6ef2fae3a 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1289,6 +1289,7 @@ enum netdev_priv_flags {
IFF_XMIT_DST_RELEASE_PERM   = 1<<22,
IFF_IPVLAN_MASTER   = 1<<23,
IFF_IPVLAN_SLAVE= 1<<24,
+   IFF_VRF_MASTER  = 1<<25,
 };
 
 #define IFF_802_1Q_VLANIFF_802_1Q_VLAN
@@ -1316,6 +1317,7 @@ enum netdev_priv_flags {
 #define IFF_XMIT_DST_RELEASE_PERM  IFF_XMIT_DST_RELEASE_PERM
 #define IFF_IPVLAN_MASTER  IFF_IPVLAN_MASTER
 #define IFF_IPVLAN_SLAVE   IFF_IPVLAN_SLAVE
+#define IFF_VRF_MASTER IFF_VRF_MASTER
 
 /**
  * struct net_device - The DEVICE structure.
@@ -1432,6 +1434,7 @@ enum netdev_priv_flags {
  * @dn_ptr:DECnet specific data
  * @ip6_ptr:   IPv6 specific data
  * @ax25_ptr:  AX.25 specific data
+ * @vrf_ptr:   VRF specific data
  * @ieee80211_ptr: IEEE 802.11 specific data, assign before registering
  *
  * @last_rx:   Time of last Rx
@@ -1650,6 +1653,7 @@ struct net_device {
struct dn_dev __rcu *dn_ptr;
struct inet6_dev __rcu  *ip6_ptr;
void*ax25_ptr;
+   struct net_vrf_dev __rcu *vrf_ptr;
struct wireless_dev *ieee80211_ptr;
struct wpan_dev *ieee802154_ptr;
 #if IS_ENABLED(CONFIG_MPLS_ROUTING)
@@ -3808,6 +3812,22 @@ static inline bool netif_supports_nofcs(struct 
net_device *dev)
return dev->priv_flags & IFF_SUPP_NOFCS;
 }
 
+static inline bool netif_is_vrf(const struct net_device *dev)
+{
+   return dev->priv_flags & IFF_VRF_MASTER;
+}
+
+static inline bool netif_index_is_vrf(struct net *net, int ifindex)
+{
+   struct net_device *dev = dev_get_by_index_rcu(net, ifindex);
+   bool rc = false;
+
+   if (dev)
+   rc = netif_is_vrf(dev);
+
+   return rc;
+}
+
 /* This device needs to keep skb dst for qdisc enqueue or ndo_start_xmit() */
 static inline void netif_keep_dst(struct net_device *dev)
 {
diff --git a/include/net/vrf.h b/include/net/vrf.h
new file mode 100644
index ..5d4bd67a4902
--- /dev/null
+++ b/include/net/vrf.h
@@ -0,0 +1,177 @@
+/*
+ * include/net/net_vrf.h - adds vrf dev structure definitions
+ * Copyright (c) 2015 Cumulus Networks
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#ifndef __LINUX_NET_VRF_H
+#define __LINUX_NET_VRF_H
+
+struct net_vrf_dev {
+   struct rcu_head rcu;
+   int ifindex; /* ifindex of master dev */
+   u32 tb_id;   /* table id for VRF */
+};
+
+struct slave {
+   struct list_headlist;
+   struct net_device   *dev;
+};
+
+struct slave_queue {
+   struct list_headall_slaves;
+   int num_slaves;
+};
+
+struct net_vrf {
+   struct slave_queue  queue;
+   struct fib_table*tb;
+   struct rtable   *rth;
+   u32 tb_id;
+};
+
+
+#if IS_ENABLED(CONFIG_NET_VRF)
+/* called with rcu_read_lock() */
+static inline int vrf_master_ifindex_rcu(const struct net_device *dev)
+{
+   struct net_vrf_dev *vrf_ptr;
+   int ifindex = 0;
+
+   if (!dev)
+   return 0;
+
+   if (netif_is_vrf(dev))
+   ifindex = dev->ifindex;
+   else {
+   vrf_ptr = rcu_dereference(dev->vrf_ptr);
+   if (vrf_ptr)
+   ifindex = vrf_ptr->ifindex;
+   }
+
+   return ifindex;
+}
+
+static inline int vrf_master_ifindex(const struct net_device *dev)
+{
+   int ifindex;
+
+   rcu_read_lock();
+   ifindex = vrf_master_ifindex_rcu(dev);
+   rcu_read_unlock();
+
+   return ifindex;
+}
+
+static inline int vrf_master_ifindex_by_index(struct net *net, int ifindex)
+{
+   int rc = 0;
+
+   if (ifindex) {
+   struct net_device *dev = dev_get_by_index(net, ifindex);
+
+   if (dev) {
+   rc = vrf_master_ifindex(dev);
+

[PATCH net-next 00/10] VRF-lite - v4

2015-08-05 Thread David Ahern

In the context of internet scale routing a requirement that always comes
up is the need to partition the available routing tables into disjoint
routing planes. A specific use case is the multi-tenancy problem where
each tenant has their own unique routing tables and in the very least
need different default gateways.

This patch allows the ability to create virtual router domains (aka VRFs
(VRF-lite to be specific) in the linux packet forwarding stack. The main
observation is that through the use of rules and socket binding to interfaces,
all the facilities that we need are already present in the infrastructure. What
is missing is a handle that identifies a routing domain and can be used to
gather applicable rules/tables and uniqify neighbor selection. The scheme used
needs to preserves the notions of ECMP, and general routing principles.

This driver is a cross between functionality that the IPVLAN driver
and the Team drivers provide where a device is created and packets
into/out of the routing domain are shuttled through this device. The
device is then used as a handle to identify the applicable rules. The
VRF device is thus the layer3 equivalent of a vlan device.

The very important point to note is that this is only a Layer3 concept
so L2 tools (e.g., LLDP) do not need to be run in each VRF, processes can
run in unaware mode or select a VRF to be talking through. Also the
behavioral model is a generalized application of the familiar VRF-Lite
model with some performance paths that need optimization. (Specifically
the output route selector that Roopa, Robert, Thomas and EricB are
currently discussing on the MPLS thread)

High Level points
=
1. Simple overlay driver (minimal changes to current stack)
   * uses the existing fib tables and fib rules infrastructure
2. Modelled closely after the ipvlan driver
3. Uses current API and infrastructure.
   * Applications can use SO_BINDTODEVICE or cmsg device indentifiers
 to pick VRF (ping, traceroute just work)
   * Standard IP Rules work, and since they are aggregated against the
 device, scale is manageable
4. Completely orthogonal to Namespaces and only provides separation in
   the routing plane (and ARP)

 N2
   N1 (all configs here)  +---+
+--+  |   |
|swp1 :10.0.1.1+--+swp1 :10.0.1.2 |
|  |  |   |
|swp2 :10.0.2.1+--+swp2 :10.0.2.2 |
|  |  +---+
| VRF 1|
| table 5  |
|  |
+---+
|  |
| VRF 2| N3
| table 6  |  +---+
|  |  |   |
|swp3 :10.0.2.1+--+swp1 :10.0.2.2 |
|  |  |   |
|swp4 :10.0.3.1+--+swp2 :10.0.3.2 |
+--+  +---+


Given the topology above, the setup needed to get the basic VRF
functions working would be

Create the VRF devices and associate with a table
ip link add vrf1 type vrf table 5
ip link add vrf2 type vrf table 6

Install the lookup rules that map table to VRF domain
ip rule add pref 200 oif vrf1 lookup 5
ip rule add pref 200 iif vrf1 lookup 5
ip rule add pref 200 oif vrf2 lookup 6
ip rule add pref 200 iif vrf2 lookup 6

ip link set vrf1 up
ip link set vrf2 up

Enslave the routing member interfaces
ip link set swp1 master vrf1
ip link set swp2 master vrf1
ip link set swp3 master vrf2
ip link set swp4 master vrf2

Connected and local routes are automatically moved from main and local
tables to the VRF table.

ping using VRF0 is simply
ping -I vrf0 10.0.1.2


Design Highlights
=
If a device is enslaved to a VRF device (ie., associated with a VRF)
then:
1. Rx path
   The master device index is used as the iif for all lookups.

2. Tx path
   Similarly, for Tx the VRF device oif is used in the flow to direct
   lookups to the table associated with the VRF via its rule. From there
   the FLOWI_FLAG_VRFSRC flag is used to indicate that the oif should
   not be used for FIB table lookups.

3. Connected and local routes
   On link up for a device, connected and local routes are added to the
   table associated with the VRF device, rather than the local and main
   tables.

4. Socket lookups
   Socket lookups use the VRF device for comparison with sk_bound_dev_if.
   If a socket is not bound to a device a socket match can happen based
   on destination address, port and protocol in which case a VRF global
   or agnostic process handles the connection (ie., this allows 1 listener
   socket to handle connections across VRFs). The child socket becomes
   bound to the V

[PATCH] iproute2: Add support for VRF device

2015-08-05 Thread David Ahern

Allow user to create a vrf device and specify its table binding.
Based on the iplink_vlan implementation.

Signed-off-by: Shrijeet Mukherjee 
Signed-off-by: David Ahern 
---
 include/linux/if_link.h |  8 +
 ip/Makefile |  2 +-
 ip/iplink.c |  2 +-
 ip/iplink_vrf.c | 85 +
 4 files changed, 95 insertions(+), 2 deletions(-)
 create mode 100644 ip/iplink_vrf.c

diff --git a/include/linux/if_link.h b/include/linux/if_link.h
index b905cf7f4948..74dedf4320b8 100644
--- a/include/linux/if_link.h
+++ b/include/linux/if_link.h
@@ -338,6 +338,14 @@ enum macvlan_macaddr_mode {
 
 #define MACVLAN_FLAG_NOPROMISC 1
 
+/* VRF section */
+enum {
+   IFLA_VRF_UNSPEC,
+   IFLA_VRF_TABLE,
+   __IFLA_VRF_MAX
+};
+
+#define IFLA_VRF_MAX (__IFLA_VRF_MAX - 1)
 /* IPVLAN section */
 enum {
IFLA_IPVLAN_UNSPEC,
diff --git a/ip/Makefile b/ip/Makefile
index 77653ecc5785..d8b38ac2e44b 100644
--- a/ip/Makefile
+++ b/ip/Makefile
@@ -7,7 +7,7 @@ IPOBJ=ip.o ipaddress.o ipaddrlabel.o iproute.o iprule.o 
ipnetns.o \
 iplink_vxlan.o tcp_metrics.o iplink_ipoib.o ipnetconf.o link_ip6tnl.o \
 link_iptnl.o link_gre6.o iplink_bond.o iplink_bond_slave.o iplink_hsr.o \
 iplink_bridge.o iplink_bridge_slave.o ipfou.o iplink_ipvlan.o \
-iplink_geneve.o
+iplink_geneve.o iplink_vrf.o
 
 RTMONOBJ=rtmon.o
 
diff --git a/ip/iplink.c b/ip/iplink.c
index 369d50eab94e..14bf7211a447 100644
--- a/ip/iplink.c
+++ b/ip/iplink.c
@@ -94,7 +94,7 @@ void iplink_usage(void)
fprintf(stderr, "TYPE := { vlan | veth | vcan | dummy | ifb | 
macvlan | macvtap |\n");
fprintf(stderr, "  bridge | bond | ipoib | ip6tnl | 
ipip | sit | vxlan |\n");
fprintf(stderr, "  gre | gretap | ip6gre | ip6gretap | 
vti | nlmon |\n");
-   fprintf(stderr, "  bond_slave | ipvlan | geneve }\n");
+   fprintf(stderr, "  bond_slave | ipvlan | geneve | vrf 
}\n");
}
exit(-1);
 }
diff --git a/ip/iplink_vrf.c b/ip/iplink_vrf.c
new file mode 100644
index ..0d7e21c7c152
--- /dev/null
+++ b/ip/iplink_vrf.c
@@ -0,0 +1,85 @@
+/* iplink_vrf.cVRF device support
+ *
+ *  This program is free software; you can redistribute it and/or
+ *  modify it under the terms of the GNU General Public License
+ *  as published by the Free Software Foundation; either version
+ *  2 of the License, or (at your option) any later version.
+ *
+ * Authors: Shrijeet Mukherjee 
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "rt_names.h"
+#include "utils.h"
+#include "ip_common.h"
+
+static void vrf_explain(FILE *f)
+{
+   fprintf(f, "Usage: ... vrf table TABLEID \n");
+}
+
+static void explain(void)
+{
+   vrf_explain(stderr);
+}
+
+static int table_arg(void)
+{
+   fprintf(stderr,"Error: argument of \"table\" must be 0-32767 and 
currently unused\n");
+   return -1;
+}
+
+static int vrf_parse_opt(struct link_util *lu, int argc, char **argv,
+   struct nlmsghdr *n)
+{
+   while (argc > 0) {
+   if (matches(*argv, "table") == 0) {
+   __u32 table = 0;
+   NEXT_ARG();
+
+   table = atoi(*argv);
+   if (table < 0 || table > 32767)
+   return table_arg();
+   addattr32(n, 1024, IFLA_VRF_TABLE, table);
+   } else if (matches(*argv, "help") == 0) {
+   explain();
+   return -1;
+   } else {
+   fprintf(stderr, "vrf: unknown option \"%s\"?\n",
+   *argv);
+   explain();
+   return -1;
+   }
+   argc--, argv++;
+   }
+
+   return 0;
+}
+
+static void vrf_print_opt(struct link_util *lu, FILE *f, struct rtattr *tb[])
+{
+   if (!tb)
+   return;
+
+   if (tb[IFLA_VRF_TABLE])
+   fprintf(f, "table %u ", rta_getattr_u32(tb[IFLA_VRF_TABLE]));
+}
+
+static void vrf_print_help(struct link_util *lu, int argc, char **argv,
+ FILE *f)
+{
+   vrf_explain(f);
+}
+
+struct link_util vrf_link_util = {
+   .id = "vrf",
+   .maxattr= IFLA_VRF_MAX,
+   .parse_opt  = vrf_parse_opt,
+   .print_opt  = vrf_print_opt,
+   .print_help = vrf_print_help,
+};
-- 
2.3.2 (Apple Git-55)

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 08/10] net: Use passed in table for nexthop lookups

2015-08-05 Thread David Ahern

If a user passes in a table for new routes use that table for nexthop
lookups. Specifically, this solves the case where a connected route does
not exist in the main table, but only another table and then a subsequent
route is added with a next hop using the connected route. ie.,

$ ip route ls
default via 10.0.2.2 dev eth0
10.0.2.0/24 dev eth0  proto kernel  scope link  src 10.0.2.15
169.254.0.0/16 dev eth0  scope link  metric 1003
192.168.56.0/24 dev eth1  proto kernel  scope link  src 192.168.56.51

$ ip route ls table 10
1.1.1.0/24 dev eth2  scope link

Without this patch adding a nexthop route fails:

$ ip route add table 10 2.2.2.0/24 via 1.1.1.10
RTNETLINK answers: Network is unreachable

With this patch the route is added successfully.

Signed-off-by: David Ahern 
---
 net/ipv4/fib_semantics.c | 13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
index 85e9a8abf15c..b7f1d20a9615 100644
--- a/net/ipv4/fib_semantics.c
+++ b/net/ipv4/fib_semantics.c
@@ -691,6 +691,7 @@ static int fib_check_nh(struct fib_config *cfg, struct 
fib_info *fi,
}
rcu_read_lock();
{
+   struct fib_table *tbl = NULL;
struct flowi4 fl4 = {
.daddr = nh->nh_gw,
.flowi4_scope = cfg->fc_scope + 1,
@@ -701,8 +702,16 @@ static int fib_check_nh(struct fib_config *cfg, struct 
fib_info *fi,
/* It is not necessary, but requires a bit of thinking 
*/
if (fl4.flowi4_scope < RT_SCOPE_LINK)
fl4.flowi4_scope = RT_SCOPE_LINK;
-   err = fib_lookup(net, &fl4, &res,
-FIB_LOOKUP_IGNORE_LINKSTATE);
+
+   if (cfg->fc_table)
+   tbl = fib_get_table(net, cfg->fc_table);
+
+   if (tbl)
+   err = fib_table_lookup(tbl, &fl4, &res,
+  FIB_LOOKUP_IGNORE_LINKSTATE);
+   else
+   err = fib_lookup(net, &fl4, &res,
+FIB_LOOKUP_IGNORE_LINKSTATE);
if (err) {
rcu_read_unlock();
return err;
-- 
2.3.2 (Apple Git-55)

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 4/8] xen: Use the correctly the Xen memory terminologies

2015-08-05 Thread Wei Liu

On Tue, Aug 04, 2015 at 07:12:48PM +0100, Julien Grall wrote:
[...]
> diff --git a/drivers/net/xen-netback/netback.c 
> b/drivers/net/xen-netback/netback.c
> index 7d50711..3b7b7c3 100644
> --- a/drivers/net/xen-netback/netback.c
> +++ b/drivers/net/xen-netback/netback.c
> @@ -314,7 +314,7 @@ static void xenvif_gop_frag_copy(struct xenvif_queue 
> *queue, struct sk_buff *skb
>   } else {
>   copy_gop->source.domid = DOMID_SELF;
>   copy_gop->source.u.gmfn =
> - virt_to_mfn(page_address(page));
> + virt_to_gfn(page_address(page));
>   }
>   copy_gop->source.offset = offset;
>  
> @@ -1284,7 +1284,7 @@ static void xenvif_tx_build_gops(struct xenvif_queue 
> *queue,
>   queue->tx_copy_ops[*copy_ops].source.offset = txreq.offset;
>  
>   queue->tx_copy_ops[*copy_ops].dest.u.gmfn =
> - virt_to_mfn(skb->data);
> + virt_to_gfn(skb->data);
>   queue->tx_copy_ops[*copy_ops].dest.domid = DOMID_SELF;
>   queue->tx_copy_ops[*copy_ops].dest.offset =
>   offset_in_page(skb->data);

Acked-by: Wei Liu 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 4/8] xen: Use the correctly the Xen memory terminologies

2015-08-05 Thread Dmitry Torokhov

On Wed, Aug 05, 2015 at 11:08:55AM +0100, Stefano Stabellini wrote:
> On Tue, 4 Aug 2015, Julien Grall wrote:
> > Based on include/xen/mm.h [1], Linux is mistakenly using MFN when GFN
> > is meant, I suspect this is because the first support for Xen was for
> > PV. This resulted in some misimplementation of helpers on ARM and
> > confused developers about the expected behavior.
> > 
> > For instance, with pfn_to_mfn, we expect to get an MFN based on the name.
> > Although, if we look at the implementation on x86, it's returning a GFN.
> > 
> > For clarity and avoid new confusion, replace any reference to mfn with
> > gfn in any helpers used by PV drivers. The x86 code will still keep some
> > reference of pfn_to_mfn but exclusively for PV (a BUG_ON has been added
> > to ensure this). No changes as been made in the hypercall field, even
> > though they may be invalid, in order to keep the same as the defintion
> > in xen repo.
> > 
> > Take also the opportunity to simplify simple construction such
> > as pfn_to_mfn(page_to_pfn(page)) into page_to_gfn. More complex clean up
> > will come in follow-up patches.
> > 
> > [1] 
> > http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=e758ed14f390342513405dd766e874934573e6cb
> > 
> > Signed-off-by: Julien Grall 
> > Cc: Stefano Stabellini 
> > Cc: Russell King 
> > Cc: Konrad Rzeszutek Wilk 
> > Cc: Boris Ostrovsky 
> > Cc: David Vrabel 
> > Cc: Thomas Gleixner 
> > Cc: Ingo Molnar 
> > Cc: "H. Peter Anvin" 
> > Cc: x...@kernel.org
> > Cc: "Roger Pau Monné" 
> > Cc: Dmitry Torokhov 
> > Cc: Ian Campbell 
> > Cc: Wei Liu 
> > Cc: Juergen Gross 
> > Cc: "James E.J. Bottomley" 
> > Cc: Greg Kroah-Hartman 
> > Cc: Jiri Slaby 
> > Cc: Jean-Christophe Plagniol-Villard 
> > Cc: Tomi Valkeinen 
> > Cc: linux-in...@vger.kernel.org
> > Cc: netdev@vger.kernel.org
> > Cc: linux-s...@vger.kernel.org
> > Cc: linuxppc-...@lists.ozlabs.org
> > Cc: linux-fb...@vger.kernel.org
> > Cc: linux-arm-ker...@lists.infradead.org
> 
> Aside from the x86 bits:
> 
> Reviewed-by: Stefano Stabellini 

Not really important, but just in case anyone waits for my ack on input
bits:

Acked-by: Dmitry Torokhov 

Thanks.

-- 
Dmitry
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] sky2: Add module parameter for passing the MAC address

2015-08-05 Thread Stephen Hemminger

On Wed,  5 Aug 2015 16:50:54 +0100
Liviu Dudau  wrote:

> For designs where EEPROMs are not connected to PCI Yukon2
> chips we need to get the MAC address from the firmware.
> Add a module parameter called 'mac_address' for this. It
> will be used if no DT node can be found and the B2_MAC
> register holds an invalid value.
> 
> Signed-off-by: Liviu Dudau 

Yes, I can see that this can be a real problem, and other drivers
solve the problem. The standard method is to assign a random mac address
(and then let scripts overwrite) rather than introducing module parameter.
Module parameters are discouraged because they are device specific.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next] net: Fix race condition in store_rps_map

2015-08-05 Thread Tom Herbert

There is a race condition in store_rps_map that allows jump label
count in rps_needed to go below zero. This can happen when
concurrently attempting to set and a clear map.

Scenario:

1. rps_needed count is zero
2. New map is assigned by setting thread, but rps_needed count _not_ yet
   incremented (rps_needed count still zero)
2. Map is cleared by second thread, old_map set to that just assigned
3. Second thread performs static_key_slow_dec, rps_needed count now goes
   negative

Fix is to increment or decrement rps_needed under the spinlock.

Signed-off-by: Tom Herbert 
---
 net/core/net-sysfs.c | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index 194c1d0..39ec694 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -726,14 +726,17 @@ static ssize_t store_rps_map(struct netdev_rx_queue 
*queue,
old_map = rcu_dereference_protected(queue->rps_map,
lockdep_is_held(&rps_map_lock));
rcu_assign_pointer(queue->rps_map, map);
-   spin_unlock(&rps_map_lock);
 
if (map)
static_key_slow_inc(&rps_needed);
-   if (old_map) {
-   kfree_rcu(old_map, rcu);
+   if (old_map)
static_key_slow_dec(&rps_needed);
-   }
+
+   spin_unlock(&rps_map_lock);
+
+   if (old_map)
+   kfree_rcu(old_map, rcu);
+
free_cpumask_var(mask);
return len;
 }
-- 
1.8.5.6

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Xen-devel] printk from softirq on xen: hard lockup

2015-08-05 Thread Jason A. Donenfeld

Hi folks,

I have written an extremely simple reproducer. Xen 4.5.1. Linux 4.1.3.
Config attached. Reproducer attached. Makefile attached.

It results in the COMPLETE lockup of the system when it receives a
network packet over the Xen PV network interface.

The lockup is 100% reliable. As in the messages above, it puts this --
"while (native_apic_mem_read(APIC_ICR) & APIC_ICR_BUSY)" into a busy
loop that never exits.

It is triggered by a simple printk in softirq.

Thanks,
Jason


Makefile
Description: Binary data
#include 
#include 
#include 
#include 
#include 

static struct socket *s = NULL;

static int receive(struct sock *sk, struct sk_buff *skb)
{
	net_info_ratelimited("The printing of this message will crash a Xen PV guest.\n");
	dev_kfree_skb(skb);
	return 0;
}

static int __init mod_init(void)
{
	int ret;
	struct udp_port_cfg port = {
		.family = AF_INET,
		.local_ip = { htonl(INADDR_ANY) },
		.local_udp_port = htons(32812),
		.use_udp_checksums = 1
	};
	struct udp_tunnel_sock_cfg tunnel = {
		.encap_type = 1,
		.encap_rcv = receive
	};
	ret = udp_sock_create4(&init_net, &port, &s);
	if (ret)
		return ret;
	setup_udp_tunnel_sock(&init_net, s, &tunnel);
	return ret;
}

static void __exit mod_exit(void)
{
	if (s)
		udp_tunnel_sock_release(s);
}

module_init(mod_init);
module_exit(mod_exit);
MODULE_LICENSE("GPL");
MODULE_DESCRIPTION("Send a UDP packet to port 32812");
MODULE_AUTHOR("Jason A. Donenfeld ");


4.1.3-domU-config
Description: Binary data

Re: [PATCH v6 3/4] bpf: Implement function bpf_perf_event_read() that get the selected hardware PMU conuter

2015-08-05 Thread Peter Zijlstra

On Wed, Aug 05, 2015 at 09:08:32AM -0700, Alexei Starovoitov wrote:
> On 8/5/15 6:53 AM, Peter Zijlstra wrote:
> >+/*
> >+ * If the event is currently on this CPU, its either a per-task event,
> >+ * or local to this CPU. Furthermore it means its ACTIVE (otherwise
> >+ * oncpu == -1).
> >+ */
> >+if (event->oncpu == smp_processor_id())
> >+event->pmu->read(event);
> >+
> >+val = local64_read(&event->count);
> >+local_irq_restore(flags);
> >+
> 
> nice! cleaner and faster.
> so raw_spin_lock(&ctx->lock) is not needed, because
> update_*(event) methods are not called, right?

Indeed, and by ensuring the event is indeed local (by force of WARN_ON)
disabling IRQs will avoid counter scheduling and result in a stable
event state.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next] vxlan: combine VXLAN_FLOWBASED into VXLAN_COLLECT_METADATA

2015-08-05 Thread Alexei Starovoitov


On 8/5/15 7:42 AM, Thomas Graf wrote:

I have no objection since a flag to enable tx only can be added again if
needed. As stated in the other thread, the tx only mode which is what
VXLAN was capable of doing so far is what motivated the split of flags.


I see the intent. A bit confusing though, since today FLOWBASED
affects pieces of TX and RX and COLLECT_METADATA another piece of RX.
After the patch COLLECT_METADATA does it all. imo cleaner.
but yeah, as you said, we can add 'tx only' later if really needed.
thanks for review!

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next] net/mlx5_core: Set log_uar_page_sz for non 4K page size architecture

2015-08-05 Thread clsoto

From: Carol L Soto 

failed to configure the page size for architectures with page size
different than 4K.

Signed-off-by: Carol L Soto 
---
 drivers/net/ethernet/mellanox/mlx5/core/main.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index 603a8b0..03aabdd 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -391,6 +391,8 @@ static int handle_hca_cap(struct mlx5_core_dev *dev)
/* disable cmdif checksum */
MLX5_SET(cmd_hca_cap, set_hca_cap, cmdif_checksum, 0);
 
+   MLX5_SET(cmd_hca_cap, set_hca_cap, log_uar_page_sz, PAGE_SHIFT - 12);
+
err = set_caps(dev, set_ctx, set_sz);
 
 query_ex:
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v6 3/4] bpf: Implement function bpf_perf_event_read() that get the selected hardware PMU conuter

2015-08-05 Thread Alexei Starovoitov


On 8/5/15 6:53 AM, Peter Zijlstra wrote:

+   /*
+* If the event is currently on this CPU, its either a per-task event,
+* or local to this CPU. Furthermore it means its ACTIVE (otherwise
+* oncpu == -1).
+*/
+   if (event->oncpu == smp_processor_id())
+   event->pmu->read(event);
+
+   val = local64_read(&event->count);
+   local_irq_restore(flags);
+


nice! cleaner and faster.
so raw_spin_lock(&ctx->lock) is not needed, because
update_*(event) methods are not called, right?

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v6 3/4] bpf: Implement function bpf_perf_event_read() that get the selected hardware PMU conuter

2015-08-05 Thread Alexei Starovoitov


On 8/5/15 3:15 AM, Peter Zijlstra wrote:

On Wed, Aug 05, 2015 at 12:04:25PM +0200, Peter Zijlstra wrote:

On Tue, Aug 04, 2015 at 08:58:15AM +, Kaixu Xia wrote:



+   event->ctx->task != current)


Strictly speaking we should hold rcu_read_lock around dereferencing
event->ctx (or have IRQs disabled -- although I know Paul doesn't like
us relying on that).


programs are always executing under rcu_read_lock, so we should be good
here too.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v6 3/4] bpf: Implement function bpf_perf_event_read() that get the selected hardware PMU conuter

2015-08-05 Thread Alexei Starovoitov


On 8/5/15 3:04 AM, Peter Zijlstra wrote:

>+   __perf_event_read(event);
>+   return perf_event_count(event);
>+}

Also, you probably want a WARN_ON(in_nmi()) there, this function is
_NOT_  NMI safe.


we check that very early on:
unsigned int trace_call_bpf(struct bpf_prog *prog, void *ctx)
{
unsigned int ret;

if (in_nmi()) /* not supported yet */
return 1;
...

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] sky2: Add module parameter for passing the MAC address

2015-08-05 Thread Liviu Dudau

For designs where EEPROMs are not connected to PCI Yukon2
chips we need to get the MAC address from the firmware.
Add a module parameter called 'mac_address' for this. It
will be used if no DT node can be found and the B2_MAC
register holds an invalid value.

Signed-off-by: Liviu Dudau 
---
 drivers/net/ethernet/marvell/sky2.c | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/marvell/sky2.c 
b/drivers/net/ethernet/marvell/sky2.c
index d9f4498..a977d95 100644
--- a/drivers/net/ethernet/marvell/sky2.c
+++ b/drivers/net/ethernet/marvell/sky2.c
@@ -101,6 +101,10 @@ static int legacy_pme = 0;
 module_param(legacy_pme, int, 0);
 MODULE_PARM_DESC(legacy_pme, "Legacy power management");
 
+/* Ugh!  Let the firmware tell us the hardware address */
+static int mac_address[ETH_ALEN] = { 0, };
+module_param_array(mac_address, int, NULL, 0);
+
 static const struct pci_device_id sky2_id_table[] = {
{ PCI_DEVICE(PCI_VENDOR_ID_SYSKONNECT, 0x9000) }, /* SK-9Sxx */
{ PCI_DEVICE(PCI_VENDOR_ID_SYSKONNECT, 0x9E00) }, /* SK-9Exx */
@@ -4811,13 +4815,21 @@ static struct net_device *sky2_init_netdev(struct 
sky2_hw *hw, unsigned port,
/* try to get mac address in the following order:
 * 1) from device tree data
 * 2) from internal registers set by bootloader
+* 3) from the command line parameter
 */
iap = of_get_mac_address(hw->pdev->dev.of_node);
if (iap)
memcpy(dev->dev_addr, iap, ETH_ALEN);
-   else
+   else {
memcpy_fromio(dev->dev_addr, hw->regs + B2_MAC_1 + port * 8,
  ETH_ALEN);
+   if (!is_valid_ether_addr(&dev->dev_addr[0])) {
+   int i;
+
+   for (i = 0; i < ETH_ALEN; i++)
+   dev->dev_addr[i] = mac_address[i];
+   }
+   }
 
return dev;
 }
-- 
2.4.6

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[v2 1/9] devres: add devm_alloc_percpu()

2015-08-05 Thread Madalin Bucur

Introduce managed counterparts for alloc_percpu() and free_percpu().
Add devm_alloc_percpu() and devm_free_percpu() into the managed
interfaces list.

Signed-off-by: Madalin Bucur 
---
 Documentation/driver-model/devres.txt |  4 +++
 drivers/base/devres.c | 64 +++
 include/linux/device.h| 19 +++
 3 files changed, 87 insertions(+)

diff --git a/Documentation/driver-model/devres.txt 
b/Documentation/driver-model/devres.txt
index 831a536..595fd1b 100644
--- a/Documentation/driver-model/devres.txt
+++ b/Documentation/driver-model/devres.txt
@@ -312,6 +312,10 @@ MEM
   devm_kvasprintf()
   devm_kzalloc()
 
+PER-CPU MEM
+  devm_alloc_percpu()
+  devm_free_percpu()
+
 PCI
   pcim_enable_device() : after success, all PCI ops become managed
   pcim_pin_device(): keep PCI device enabled after release
diff --git a/drivers/base/devres.c b/drivers/base/devres.c
index c8a53d1..deb2ea0 100644
--- a/drivers/base/devres.c
+++ b/drivers/base/devres.c
@@ -10,6 +10,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "base.h"
 
@@ -984,3 +985,66 @@ void devm_free_pages(struct device *dev, unsigned long 
addr)
   &devres));
 }
 EXPORT_SYMBOL_GPL(devm_free_pages);
+
+static void devm_percpu_release(struct device *dev, void *pdata)
+{
+   void __percpu *p;
+
+   p = *(void __percpu **)pdata;
+   free_percpu(p);
+}
+
+static int devm_percpu_match(struct device *dev, void *data, void *p)
+{
+   struct devres *devr = container_of(data, struct devres, data);
+
+   return *(void **)devr->data == p;
+}
+
+/**
+ * __devm_alloc_percpu - Resource-managed alloc_percpu
+ * @dev: Device to allocate per-cpu memory for
+ * @size: Size of per-cpu memory to allocate
+ * @align: Alignement of per-cpu memory to allocate
+ *
+ * Managed alloc_percpu. Per-cpu memory allocated with this function is
+ * automatically freed on driver detach.
+ *
+ * RETURNS:
+ * Pointer to allocated memory on success, NULL on failure.
+ */
+void __percpu *__devm_alloc_percpu(struct device *dev, size_t size,
+   size_t align)
+{
+   void *p;
+   void __percpu *pcpu;
+
+   pcpu = __alloc_percpu(size, align);
+   if (!pcpu)
+   return NULL;
+
+   p = devres_alloc(devm_percpu_release, sizeof(void *), GFP_KERNEL);
+   if (!p)
+   return NULL;
+
+   *(void __percpu **)p = pcpu;
+
+   devres_add(dev, p);
+
+   return pcpu;
+}
+EXPORT_SYMBOL_GPL(__devm_alloc_percpu);
+
+/**
+ * devm_free_percpu - Resource-managed free_percpu
+ * @dev: Device this memory belongs to
+ * @pdata: Per-cpu memory to free
+ *
+ * Free memory allocated with devm_alloc_percpu().
+ */
+void devm_free_percpu(struct device *dev, void __percpu *pdata)
+{
+   WARN_ON(devres_destroy(dev, devm_percpu_release, devm_percpu_match,
+  (void *)pdata));
+}
+EXPORT_SYMBOL_GPL(devm_free_percpu);
diff --git a/include/linux/device.h b/include/linux/device.h
index a2b4ea7..126c25b 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -673,6 +673,25 @@ void __iomem *devm_ioremap_resource(struct device *dev, 
struct resource *res);
 int devm_add_action(struct device *dev, void (*action)(void *), void *data);
 void devm_remove_action(struct device *dev, void (*action)(void *), void 
*data);
 
+/**
+ * devm_alloc_percpu - Resource-managed alloc_percpu
+ * @dev: Device to allocate per-cpu memory for
+ * @type: Type to allocate per-cpu memory for
+ *
+ * Managed alloc_percpu. Per-cpu memory allocated with this function is
+ * automatically freed on driver detach.
+ *
+ * RETURNS:
+ * Pointer to allocated memory on success, NULL on failure.
+ */
+#define devm_alloc_percpu(dev, type)  \
+   (typeof(type) __percpu *)__devm_alloc_percpu(dev, sizeof(type), \
+__alignof__(type))
+
+void __percpu *__devm_alloc_percpu(struct device *dev, size_t size,
+  size_t align);
+void devm_free_percpu(struct device *dev, void __percpu *pdata);
+
 struct device_dma_parameters {
/*
 * a low level driver may set these to teach IOMMU code about
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[v2 4/9] dpaa_eth: add driver's Tx queue selection mechanism

2015-08-05 Thread Madalin Bucur

Allow the selection of the transmission queue based on the CPU id.

Signed-off-by: Madalin Bucur 
---
 drivers/net/ethernet/freescale/dpaa/Kconfig   | 10 ++
 drivers/net/ethernet/freescale/dpaa/dpaa_eth.c|  3 +++
 drivers/net/ethernet/freescale/dpaa/dpaa_eth.h|  6 ++
 drivers/net/ethernet/freescale/dpaa/dpaa_eth_common.c |  8 
 drivers/net/ethernet/freescale/dpaa/dpaa_eth_common.h |  4 
 5 files changed, 31 insertions(+)

diff --git a/drivers/net/ethernet/freescale/dpaa/Kconfig 
b/drivers/net/ethernet/freescale/dpaa/Kconfig
index 1f3a203..6147403 100644
--- a/drivers/net/ethernet/freescale/dpaa/Kconfig
+++ b/drivers/net/ethernet/freescale/dpaa/Kconfig
@@ -11,6 +11,16 @@ menuconfig FSL_DPAA_ETH
 
 if FSL_DPAA_ETH
 
+config FSL_DPAA_ETH_USE_NDO_SELECT_QUEUE
+   bool "Use driver's Tx queue selection mechanism"
+   default y
+   ---help---
+ The DPAA-Ethernet driver defines a ndo_select_queue() callback for 
optimal selection
+ of the egress FQ. That will override the XPS support for this 
netdevice.
+ If for whatever reason you want to be in control of the egress 
FQ-to-CPU selection and mapping,
+ or simply don't want to use the driver's ndo_select_queue() callback, 
then unselect this
+ and use the standard XPS support instead.
+
 config FSL_DPAA_CS_THRESHOLD_1G
hex "Egress congestion threshold on 1G ports"
range 0x1000 0x1000
diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c 
b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
index 53c37cd..264945c 100644
--- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
+++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
@@ -387,6 +387,9 @@ static const struct net_device_ops dpa_private_ops = {
.ndo_get_stats64 = dpa_get_stats64,
.ndo_set_mac_address = dpa_set_mac_address,
.ndo_validate_addr = eth_validate_addr,
+#ifdef CONFIG_FSL_DPAA_ETH_USE_NDO_SELECT_QUEUE
+   .ndo_select_queue = dpa_select_queue,
+#endif
.ndo_change_mtu = dpa_change_mtu,
.ndo_set_rx_mode = dpa_set_rx_mode,
.ndo_init = dpa_ndo_init,
diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.h 
b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.h
index d337dcc..55c1106 100644
--- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.h
+++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.h
@@ -430,9 +430,15 @@ static inline void _dpa_assign_wq(struct dpa_fq *fq)
}
 }
 
+#ifdef CONFIG_FSL_DPAA_ETH_USE_NDO_SELECT_QUEUE
+/* Use in lieu of skb_get_queue_mapping() */
+#define dpa_get_queue_mapping(skb) \
+   raw_smp_processor_id()
+#else
 /* Use the queue selected by XPS */
 #define dpa_get_queue_mapping(skb) \
skb_get_queue_mapping(skb)
+#endif
 
 static inline void _dpa_bp_free_pf(void *addr)
 {
diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth_common.c 
b/drivers/net/ethernet/freescale/dpaa/dpaa_eth_common.c
index 7e4b9bd..1258683 100644
--- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth_common.c
+++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth_common.c
@@ -586,6 +586,14 @@ bool dpa_bpid2pool_use(int bpid)
return false;
 }
 
+#ifdef CONFIG_FSL_DPAA_ETH_USE_NDO_SELECT_QUEUE
+u16 dpa_select_queue(struct net_device *net_dev, struct sk_buff *skb,
+void *accel_priv, select_queue_fallback_t fallback)
+{
+   return dpa_get_queue_mapping(skb);
+}
+#endif
+
 struct dpa_fq *dpa_fq_alloc(struct device *dev,
const struct fqid_cell *fqids,
struct list_head *list,
diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth_common.h 
b/drivers/net/ethernet/freescale/dpaa/dpaa_eth_common.h
index bd88dda..4581bfc 100644
--- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth_common.h
+++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth_common.h
@@ -88,6 +88,10 @@ struct dpa_bp *dpa_bpid2pool(int bpid);
 void dpa_bpid2pool_map(int bpid, struct dpa_bp *dpa_bp);
 bool dpa_bpid2pool_use(int bpid);
 void dpa_bp_drain(struct dpa_bp *bp);
+#ifdef CONFIG_FSL_DPAA_ETH_USE_NDO_SELECT_QUEUE
+u16 dpa_select_queue(struct net_device *net_dev, struct sk_buff *skb,
+void *accel_priv, select_queue_fallback_t fallback);
+#endif
 struct dpa_fq *dpa_fq_alloc(struct device *dev,
const struct fqid_cell *fqids,
struct list_head *list,
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[v2 3/9] dpaa_eth: add support for S/G frames

2015-08-05 Thread Madalin Bucur

Add support for Scater/Gather (S/G) frames. The FMan can place
the frame content into multiple buffers and provide a S/G Table
(SGT) into one first buffer with references to the others.

Signed-off-by: Madalin Bucur 
---
 drivers/net/ethernet/freescale/dpaa/dpaa_eth.c |   6 +
 .../net/ethernet/freescale/dpaa/dpaa_eth_common.c  |  47 ++-
 .../net/ethernet/freescale/dpaa/dpaa_eth_common.h  |   2 +
 drivers/net/ethernet/freescale/dpaa/dpaa_eth_sg.c  | 335 +++--
 4 files changed, 370 insertions(+), 20 deletions(-)

diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c 
b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
index 9e83bd1..53c37cd 100644
--- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
+++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
@@ -460,6 +460,12 @@ static int dpa_private_netdev_init(struct net_device 
*net_dev)
net_dev->hw_features |= (NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM |
NETIF_F_LLTX);
 
+   /* Advertise S/G and HIGHDMA support for private interfaces */
+   net_dev->hw_features |= NETIF_F_SG | NETIF_F_HIGHDMA;
+   /* Recent kernels enable GSO automatically, if
+* we declare NETIF_F_SG. For conformity, we'll
+* still declare GSO explicitly.
+*/
net_dev->features |= NETIF_F_GSO;
 
return dpa_netdev_init(net_dev, mac_addr, tx_timeout);
diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth_common.c 
b/drivers/net/ethernet/freescale/dpaa/dpaa_eth_common.c
index 10f08f7..7e4b9bd 100644
--- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth_common.c
+++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth_common.c
@@ -1122,6 +1122,35 @@ void dpaa_eth_init_ports(struct mac_device *mac_dev,
  port_fqs->rx_defq, &buf_layout[RX]);
 }
 
+void dpa_release_sgt(struct qm_sg_entry *sgt)
+{
+   struct dpa_bp *dpa_bp;
+   struct bm_buffer bmb[DPA_BUFF_RELEASE_MAX];
+   u8 i = 0, j;
+
+   memset(bmb, 0, sizeof(bmb));
+
+   do {
+   dpa_bp = dpa_bpid2pool(sgt[i].bpid);
+   DPA_ERR_ON(!dpa_bp);
+
+   j = 0;
+   do {
+   DPA_ERR_ON(sgt[i].extension);
+
+   bmb[j].hi = sgt[i].addr_hi;
+   bmb[j].lo = sgt[i].addr_lo;
+
+   j++; i++;
+   } while (j < ARRAY_SIZE(bmb) &&
+   !sgt[i - 1].final &&
+   sgt[i - 1].bpid == sgt[i].bpid);
+
+   while (bman_release(dpa_bp->pool, bmb, j, 0))
+   cpu_relax();
+   } while (!sgt[i - 1].final);
+}
+
 void __attribute__((nonnull))
 dpa_fd_release(const struct net_device *net_dev, const struct qm_fd *fd)
 {
@@ -1137,7 +1166,23 @@ dpa_fd_release(const struct net_device *net_dev, const 
struct qm_fd *fd)
dpa_bp = dpa_bpid2pool(fd->bpid);
DPA_ERR_ON(!dpa_bp);
 
-   DPA_ERR_ON(fd->format == qm_fd_sg);
+   if (fd->format == qm_fd_sg) {
+   vaddr = phys_to_virt(fd->addr);
+   sgt = vaddr + dpa_fd_offset(fd);
+
+   dma_unmap_single(dpa_bp->dev, qm_fd_addr(fd), dpa_bp->size,
+DMA_BIDIRECTIONAL);
+
+   dpa_release_sgt(sgt);
+
+   addr = dma_map_single(dpa_bp->dev, vaddr, dpa_bp->size,
+ DMA_BIDIRECTIONAL);
+   if (dma_mapping_error(dpa_bp->dev, addr)) {
+   dev_err(dpa_bp->dev, "DMA mapping failed");
+   return;
+   }
+   bm_buffer_set64(&bmb, addr);
+   }
 
while (bman_release(dpa_bp->pool, &bmb, 1, 0))
cpu_relax();
diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth_common.h 
b/drivers/net/ethernet/freescale/dpaa/dpaa_eth_common.h
index 93fcf82..bd88dda 100644
--- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth_common.h
+++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth_common.h
@@ -52,6 +52,7 @@
fm_set_##type##_port_params(port, ¶m); \
 }
 
+#define DPA_SGT_MAX_ENTRIES 16 /* maximum number of entries in SG Table */
 #define DPA_BUFF_RELEASE_MAX 8 /* maximum number of buffers released at once */
 
 /* used in napi related functions */
@@ -109,6 +110,7 @@ void dpaa_eth_init_ports(struct mac_device *mac_dev,
 struct fm_port_fqs *port_fqs,
 struct dpa_buffer_layout_s *buf_layout,
 struct device *dev);
+void dpa_release_sgt(struct qm_sg_entry *sgt);
 void __attribute__((nonnull))
 dpa_fd_release(const struct net_device *net_dev, const struct qm_fd *fd);
 int dpa_enable_tx_csum(struct dpa_priv_s *priv,
diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth_sg.c 
b/drivers/net/ethernet/freescale/dpaa/dpaa_eth_sg.c
index 15713a6..6050448 100644
--- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth_sg.c
+++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth_sg.c
@@ -53,6 +53,31 @@

[v2 0/9] dpaa_eth: Add the Freescale DPAA Ethernet driver

2015-08-05 Thread Madalin Bucur

This patch series adds the Ethernet driver for the Freescale
QorIQ Data Path Acceleration Architecture (DPAA).

This version includes changes following the feedback received
on previous versions from Eric Dumazet, Bob Cochran, Joe Perches,
Paul Bolle, Joakim Tjernlund, Scott Wood, David Miller - thanks!

Together with the driver a managed version of alloc_percpu
is provided that simplifies the release of percpu memory.

The Freescale DPAA architecture consists in a series of hardware
blocks that support the Ethernet connectivity. The Ethernet driver
depends upon the following drivers that are currently in the Linux
kernel or in review:
 - Peripheral Access Memory Unit (PAMU)
drivers/iommu/fsl_*
 - Frame Manager (FMan)
drivers/net/ethernet/freescale/fman
 - Queue Manager (QMan), Buffer Manager (BMan)
drivers/soc/fsl/qbman

The latest FMan driver patches were submitted by Igal Liberman:
https://patchwork.ozlabs.org/project/netdev/list/?submitter=64715&state=*&q=[v4,

The latest Q/BMan drivers were submitted by Roy Pledge:
https://patchwork.ozlabs.org/project/linuxppc-dev/list/?submitter=66331&state=*

Changes from v1:
 - bpool level Kconfig options removed
 - print format using pr_fmt, cleaned up prints
 - __hot/__cold removed
 - gratuitous unlikely() removed
 - code style aligned, consistent spacing for declarations
 - comment formatting

The complete patch set based on the v4.2-rc5 kernel can be found
in the public git http://git.freescale.com/git/cgit.cgi/ppc/upstream/linux.git
under the tag ldup_public_git_20150805:
http://git.freescale.com/git/cgit.cgi/ppc/upstream/linux.git/log/?h=ldup_public_git_20150805

There is one patch that needs to be applied to u-boot to align it
to the latest device tree binding document specification used by
the FMan driver. The patch is under the ldup_public_git_20150410
tag in the public git at:
http://git.freescale.com/git/cgit.cgi/ppc/upstream/u-boot.git/log/?h=ldup_public_git_20150410

Madalin Bucur (9):
  devres: add devm_alloc_percpu()
  dpaa_eth: add support for DPAA Ethernet
  dpaa_eth: add support for S/G frames
  dpaa_eth: add driver's Tx queue selection mechanism
  dpaa_eth: add ethtool functionality
  dpaa_eth: add sysfs exports
  dpaa_eth: add debugfs counters
  dpaa_eth: add debugfs entries
  dpaa_eth: add trace points

 Documentation/driver-model/devres.txt  |4 +
 drivers/base/devres.c  |   64 +
 drivers/net/ethernet/freescale/Kconfig |2 +
 drivers/net/ethernet/freescale/Makefile|1 +
 drivers/net/ethernet/freescale/dpaa/Kconfig|   63 +
 drivers/net/ethernet/freescale/dpaa/Makefile   |   17 +
 drivers/net/ethernet/freescale/dpaa/dpaa_debugfs.c |  272 
 drivers/net/ethernet/freescale/dpaa/dpaa_debugfs.h |   43 +
 drivers/net/ethernet/freescale/dpaa/dpaa_eth.c |  860 +
 drivers/net/ethernet/freescale/dpaa/dpaa_eth.h |  489 +++
 .../net/ethernet/freescale/dpaa/dpaa_eth_common.c  | 1358 
 .../net/ethernet/freescale/dpaa/dpaa_eth_common.h  |  129 ++
 drivers/net/ethernet/freescale/dpaa/dpaa_eth_sg.c  |  704 ++
 .../net/ethernet/freescale/dpaa/dpaa_eth_sysfs.c   |  167 +++
 .../net/ethernet/freescale/dpaa/dpaa_eth_trace.h   |  141 ++
 drivers/net/ethernet/freescale/dpaa/dpaa_ethtool.c |  230 
 include/linux/device.h |   19 +
 17 files changed, 4563 insertions(+)
 create mode 100644 drivers/net/ethernet/freescale/dpaa/Kconfig
 create mode 100644 drivers/net/ethernet/freescale/dpaa/Makefile
 create mode 100644 drivers/net/ethernet/freescale/dpaa/dpaa_debugfs.c
 create mode 100644 drivers/net/ethernet/freescale/dpaa/dpaa_debugfs.h
 create mode 100644 drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
 create mode 100644 drivers/net/ethernet/freescale/dpaa/dpaa_eth.h
 create mode 100644 drivers/net/ethernet/freescale/dpaa/dpaa_eth_common.c
 create mode 100644 drivers/net/ethernet/freescale/dpaa/dpaa_eth_common.h
 create mode 100644 drivers/net/ethernet/freescale/dpaa/dpaa_eth_sg.c
 create mode 100644 drivers/net/ethernet/freescale/dpaa/dpaa_eth_sysfs.c
 create mode 100644 drivers/net/ethernet/freescale/dpaa/dpaa_eth_trace.h
 create mode 100644 drivers/net/ethernet/freescale/dpaa/dpaa_ethtool.c

-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[v2 8/9] dpaa_eth: add debugfs entries

2015-08-05 Thread Madalin Bucur

Export per CPU counters through debugfs.

Signed-off-by: Madalin Bucur 
---
 drivers/net/ethernet/freescale/dpaa/Kconfig|   7 +
 drivers/net/ethernet/freescale/dpaa/Makefile   |   3 +
 drivers/net/ethernet/freescale/dpaa/dpaa_debugfs.c | 272 +
 drivers/net/ethernet/freescale/dpaa/dpaa_debugfs.h |  43 
 drivers/net/ethernet/freescale/dpaa/dpaa_eth.c |  11 +
 .../net/ethernet/freescale/dpaa/dpaa_eth_common.c  |  17 ++
 6 files changed, 353 insertions(+)
 create mode 100644 drivers/net/ethernet/freescale/dpaa/dpaa_debugfs.c
 create mode 100644 drivers/net/ethernet/freescale/dpaa/dpaa_debugfs.h

diff --git a/drivers/net/ethernet/freescale/dpaa/Kconfig 
b/drivers/net/ethernet/freescale/dpaa/Kconfig
index 6147403..98c6328 100644
--- a/drivers/net/ethernet/freescale/dpaa/Kconfig
+++ b/drivers/net/ethernet/freescale/dpaa/Kconfig
@@ -53,4 +53,11 @@ config FSL_DPAA_INGRESS_CS_THRESHOLD
  The size in bytes of the ingress tail-drop threshold on FMan ports.
  Traffic piling up above this value will be rejected by QMan and 
discarded by FMan.
 
+config FSL_DPAA_ETH_DEBUGFS
+   bool "DPAA Ethernet debugfs interface"
+   depends on DEBUG_FS
+   default y
+   ---help---
+ This option compiles debugfs code for the DPAA Ethernet driver.
+
 endif # FSL_DPAA_ETH
diff --git a/drivers/net/ethernet/freescale/dpaa/Makefile 
b/drivers/net/ethernet/freescale/dpaa/Makefile
index 3a276d5..3427de4 100644
--- a/drivers/net/ethernet/freescale/dpaa/Makefile
+++ b/drivers/net/ethernet/freescale/dpaa/Makefile
@@ -11,3 +11,6 @@ ccflags-y += -I$(FMAN)/flib
 obj-$(CONFIG_FSL_DPAA_ETH) += fsl_dpa.o
 
 fsl_dpa-objs += dpaa_eth.o dpaa_eth_sg.o dpaa_eth_common.o dpaa_ethtool.o 
dpaa_eth_sysfs.o
+ifeq ($(CONFIG_FSL_DPAA_ETH_DEBUGFS),y)
+fsl_dpa-objs += dpaa_debugfs.o
+endif
diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_debugfs.c 
b/drivers/net/ethernet/freescale/dpaa/dpaa_debugfs.c
new file mode 100644
index 000..bd426f0
--- /dev/null
+++ b/drivers/net/ethernet/freescale/dpaa/dpaa_debugfs.c
@@ -0,0 +1,272 @@
+/* Copyright 2008 - 2015 Freescale Semiconductor Inc.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ * * Redistributions of source code must retain the above copyright
+ *  notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *  notice, this list of conditions and the following disclaimer in the
+ *  documentation and/or other materials provided with the distribution.
+ * * Neither the name of Freescale Semiconductor nor the
+ *  names of its contributors may be used to endorse or promote products
+ *  derived from this software without specific prior written permission.
+ *
+ * ALTERNATIVELY, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") as published by the Free Software
+ * Foundation, either version 2 of that License or (at your option) any
+ * later version.
+ *
+ * THIS SOFTWARE IS PROVIDED BY Freescale Semiconductor ``AS IS'' AND ANY
+ * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+ * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+ * DISCLAIMED. IN NO EVENT SHALL Freescale Semiconductor BE LIABLE FOR ANY
+ * DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+ * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+ * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
+ * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF 
THIS
+ * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include 
+#include 
+#include 
+#include 
+#include "dpaa_debugfs.h"
+#include "dpaa_eth.h"
+
+#define DPA_DEBUGFS_DESCRIPTION "FSL DPAA Ethernet debugfs entries"
+#define DPA_ETH_DEBUGFS_ROOT "fsl_dpa"
+
+static int dpa_debugfs_open(struct inode *inode, struct file *file);
+
+static struct dentry *dpa_debugfs_root;
+static const struct file_operations dpa_debugfs_fops = {
+   .open   = dpa_debugfs_open,
+   .read   = seq_read,
+   .llseek = seq_lseek,
+   .release= single_release,
+};
+
+static int dpa_debugfs_show(struct seq_file *file, void *offset)
+{
+   int i;
+   struct dpa_priv_s *priv;
+   struct dpa_percpu_priv_s *percpu_priv, total;
+   struct dpa_bp *dpa_bp;
+   unsigned int dpa_bp_count = 0;
+   unsigned int count_total = 0;
+   struct qm_mcr_querycgr query_cgr;
+
+   BUG_ON(!offset);
+
+   priv = netdev_priv((struct net_device *)file->private);
+
+   dpa_bp = priv->dpa_bp;
+
+   memset(&total, 0, sizeof(tota

[v2 6/9] dpaa_eth: add sysfs exports

2015-08-05 Thread Madalin Bucur

Export Frame Queue and Buffer Pool IDs through sysfs.

Signed-off-by: Madalin Bucur 
---
 drivers/net/ethernet/freescale/dpaa/Makefile   |   2 +-
 drivers/net/ethernet/freescale/dpaa/dpaa_eth.c |   2 +
 drivers/net/ethernet/freescale/dpaa/dpaa_eth.h |   3 +
 .../net/ethernet/freescale/dpaa/dpaa_eth_common.c  |   2 +
 .../net/ethernet/freescale/dpaa/dpaa_eth_sysfs.c   | 167 +
 5 files changed, 175 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/ethernet/freescale/dpaa/dpaa_eth_sysfs.c

diff --git a/drivers/net/ethernet/freescale/dpaa/Makefile 
b/drivers/net/ethernet/freescale/dpaa/Makefile
index e137146..3a276d5 100644
--- a/drivers/net/ethernet/freescale/dpaa/Makefile
+++ b/drivers/net/ethernet/freescale/dpaa/Makefile
@@ -10,4 +10,4 @@ ccflags-y += -I$(FMAN)/flib
 
 obj-$(CONFIG_FSL_DPAA_ETH) += fsl_dpa.o
 
-fsl_dpa-objs += dpaa_eth.o dpaa_eth_sg.o dpaa_eth_common.o dpaa_ethtool.o
+fsl_dpa-objs += dpaa_eth.o dpaa_eth_sg.o dpaa_eth_common.o dpaa_ethtool.o 
dpaa_eth_sysfs.o
diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c 
b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
index 264945c..a1183f4 100644
--- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
+++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
@@ -739,6 +739,8 @@ dpaa_eth_priv_probe(struct platform_device *pdev)
if (err < 0)
goto netdev_init_failed;
 
+   dpaa_eth_sysfs_init(&net_dev->dev);
+
pr_info("Probed interface %s\n", net_dev->name);
 
return 0;
diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.h 
b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.h
index 55c1106..2a0ecf3 100644
--- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.h
+++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.h
@@ -344,6 +344,9 @@ static inline u16 dpa_get_headroom(struct 
dpa_buffer_layout_s *bl)
return bl->data_align ? ALIGN(headroom, bl->data_align) : headroom;
 }
 
+void dpaa_eth_sysfs_remove(struct device *dev);
+void dpaa_eth_sysfs_init(struct device *dev);
+
 void dpa_private_napi_del(struct net_device *net_dev);
 
 static inline void clear_fd(struct qm_fd *fd)
diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth_common.c 
b/drivers/net/ethernet/freescale/dpaa/dpaa_eth_common.c
index ca6831a..1e43fe5 100644
--- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth_common.c
+++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth_common.c
@@ -279,6 +279,8 @@ int dpa_remove(struct platform_device *pdev)
 
priv = netdev_priv(net_dev);
 
+   dpaa_eth_sysfs_remove(dev);
+
dev_set_drvdata(dev, NULL);
unregister_netdev(net_dev);
 
diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth_sysfs.c 
b/drivers/net/ethernet/freescale/dpaa/dpaa_eth_sysfs.c
new file mode 100644
index 000..a6c71b1
--- /dev/null
+++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth_sysfs.c
@@ -0,0 +1,167 @@
+/* Copyright 2008-2015 Freescale Semiconductor Inc.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ * * Redistributions of source code must retain the above copyright
+ *  notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *  notice, this list of conditions and the following disclaimer in the
+ *  documentation and/or other materials provided with the distribution.
+ * * Neither the name of Freescale Semiconductor nor the
+ *  names of its contributors may be used to endorse or promote products
+ *  derived from this software without specific prior written permission.
+ *
+ *
+ * ALTERNATIVELY, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") as published by the Free Software
+ * Foundation, either version 2 of that License or (at your option) any
+ * later version.
+ *
+ * THIS SOFTWARE IS PROVIDED BY Freescale Semiconductor ``AS IS'' AND ANY
+ * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+ * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+ * DISCLAIMED. IN NO EVENT SHALL Freescale Semiconductor BE LIABLE FOR ANY
+ * DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+ * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+ * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
+ * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF 
THIS
+ * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "dpaa_eth.h"
+#include "mac.h"
+
+static ssize_t dpaa_eth_show_addr(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+   struct dpa_priv_s *priv =

[v2 7/9] dpaa_eth: add debugfs counters

2015-08-05 Thread Madalin Bucur

Add a series of counters to be exported through debugfs:
- add detailed counters for reception errors;
- add detailed counters for QMan enqueue reject events;
- count the number of fragmented skbs received from the stack;
- count all frames received on the Tx confirmation path;
- add congestion group statistics;
- count the number of interrupts for each CPU.

Signed-off-by: Madalin Bucur 
---
 drivers/net/ethernet/freescale/dpaa/dpaa_eth.c | 12 +++
 drivers/net/ethernet/freescale/dpaa/dpaa_eth.h | 34 ++
 .../net/ethernet/freescale/dpaa/dpaa_eth_common.c  | 40 --
 .../net/ethernet/freescale/dpaa/dpaa_eth_common.h  |  2 ++
 drivers/net/ethernet/freescale/dpaa/dpaa_eth_sg.c  |  1 +
 5 files changed, 87 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c 
b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
index a1183f4..008562b 100644
--- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
+++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
@@ -98,6 +98,15 @@ static void _dpa_rx_error(struct net_device *net_dev,
 
percpu_priv->stats.rx_errors++;
 
+   if (fd->status & FM_PORT_FRM_ERR_DMA)
+   percpu_priv->rx_errors.dme++;
+   if (fd->status & FM_PORT_FRM_ERR_PHYSICAL)
+   percpu_priv->rx_errors.fpe++;
+   if (fd->status & FM_PORT_FRM_ERR_SIZE)
+   percpu_priv->rx_errors.fse++;
+   if (fd->status & FM_PORT_FRM_ERR_PRS_HDR_ERR)
+   percpu_priv->rx_errors.phe++;
+
dpa_fd_release(net_dev, fd);
 }
 
@@ -161,6 +170,8 @@ static void _dpa_tx_conf(struct net_device *net_dev,
percpu_priv->stats.tx_errors++;
}
 
+   percpu_priv->tx_confirm++;
+
skb = _dpa_cleanup_tx_fd(priv, fd);
 
dev_kfree_skb(skb);
@@ -296,6 +307,7 @@ static void priv_ern(struct qman_portal *portal,
 
percpu_priv->stats.tx_dropped++;
percpu_priv->stats.tx_fifo_errors++;
+   count_ern(percpu_priv, msg);
 
/* If we intended this buffer to go into the pool
 * when the FM was done, we need to put it in
diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.h 
b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.h
index 2a0ecf3..c66140e 100644
--- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.h
+++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.h
@@ -194,6 +194,25 @@ struct dpa_bp {
void (*free_buf_cb)(void *addr);
 };
 
+struct dpa_rx_errors {
+   u64 dme;/* DMA Error */
+   u64 fpe;/* Frame Physical Error */
+   u64 fse;/* Frame Size Error */
+   u64 phe;/* Header Error */
+};
+
+/* Counters for QMan ERN frames - one counter per rejection code */
+struct dpa_ern_cnt {
+   u64 cg_tdrop;   /* Congestion group taildrop */
+   u64 wred;   /* WRED congestion */
+   u64 err_cond;   /* Error condition */
+   u64 early_window;   /* Order restoration, frame too early */
+   u64 late_window;/* Order restoration, frame too late */
+   u64 fq_tdrop;   /* FQ taildrop */
+   u64 fq_retired; /* FQ is retired */
+   u64 orp_zero;   /* ORP disabled */
+};
+
 struct dpa_napi_portal {
struct napi_struct napi;
struct qman_portal *p;
@@ -202,7 +221,13 @@ struct dpa_napi_portal {
 struct dpa_percpu_priv_s {
struct net_device *net_dev;
struct dpa_napi_portal *np;
+   u64 in_interrupt;
+   u64 tx_confirm;
+   /* fragmented (non-linear) skbuffs received from the stack */
+   u64 tx_frag_skbuffs;
struct rtnl_link_stats64 stats;
+   struct dpa_rx_errors rx_errors;
+   struct dpa_ern_cnt ern_cnt;
 };
 
 struct dpa_priv_s {
@@ -233,6 +258,14 @@ struct dpa_priv_s {
 * (and the same) congestion group.
 */
struct qman_cgr cgr;
+   /* If congested, when it began. Used for performance stats. */
+   u32 congestion_start_jiffies;
+   /* Number of jiffies the Tx port was congested. */
+   u32 congested_jiffies;
+   /* Counter for the number of times the CGR
+* entered congestion state
+*/
+   u32 cgr_congested_count;
} cgr_data;
/* Use a per-port CGR for ingress traffic. */
bool use_ingress_cgr;
@@ -294,6 +327,7 @@ static inline int dpaa_eth_napi_schedule(struct 
dpa_percpu_priv_s *percpu_priv,
 
np->p = portal;
napi_schedule(&np->napi);
+   percpu_priv->in_interrupt++;
return 1;
}
}
diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth_common.c 
b/drivers/net/ethernet/freescale/dpaa/dpaa_eth_common.c
index 1e43fe5..459132b 100644
--- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth_common.c
+++ b/drivers/net/ethernet/free

[v2 5/9] dpaa_eth: add ethtool functionality

2015-08-05 Thread Madalin Bucur

Add support for basic ethtool operations.

Signed-off-by: Madalin Bucur 
---
 drivers/net/ethernet/freescale/dpaa/Makefile   |   2 +-
 .../net/ethernet/freescale/dpaa/dpaa_eth_common.c  |   2 +
 .../net/ethernet/freescale/dpaa/dpaa_eth_common.h  |   3 +
 drivers/net/ethernet/freescale/dpaa/dpaa_ethtool.c | 230 +
 4 files changed, 236 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/ethernet/freescale/dpaa/dpaa_ethtool.c

diff --git a/drivers/net/ethernet/freescale/dpaa/Makefile 
b/drivers/net/ethernet/freescale/dpaa/Makefile
index cf126dd..e137146 100644
--- a/drivers/net/ethernet/freescale/dpaa/Makefile
+++ b/drivers/net/ethernet/freescale/dpaa/Makefile
@@ -10,4 +10,4 @@ ccflags-y += -I$(FMAN)/flib
 
 obj-$(CONFIG_FSL_DPAA_ETH) += fsl_dpa.o
 
-fsl_dpa-objs += dpaa_eth.o dpaa_eth_sg.o dpaa_eth_common.o
+fsl_dpa-objs += dpaa_eth.o dpaa_eth_sg.o dpaa_eth_common.o dpaa_ethtool.o
diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth_common.c 
b/drivers/net/ethernet/freescale/dpaa/dpaa_eth_common.c
index 1258683..ca6831a 100644
--- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth_common.c
+++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth_common.c
@@ -81,6 +81,8 @@ int dpa_netdev_init(struct net_device *net_dev,
memcpy(net_dev->perm_addr, mac_addr, net_dev->addr_len);
memcpy(net_dev->dev_addr, mac_addr, net_dev->addr_len);
 
+   net_dev->ethtool_ops = &dpa_ethtool_ops;
+
net_dev->needed_headroom = priv->tx_headroom;
net_dev->watchdog_timeo = msecs_to_jiffies(tx_timeout);
 
diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth_common.h 
b/drivers/net/ethernet/freescale/dpaa/dpaa_eth_common.h
index 4581bfc..a940561 100644
--- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth_common.h
+++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth_common.h
@@ -58,6 +58,9 @@
 /* used in napi related functions */
 extern u16 qman_portal_max;
 
+/* from dpa_ethtool.c */
+extern const struct ethtool_ops dpa_ethtool_ops;
+
 int dpa_netdev_init(struct net_device *net_dev,
const u8 *mac_addr,
u16 tx_timeout);
diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_ethtool.c 
b/drivers/net/ethernet/freescale/dpaa/dpaa_ethtool.c
new file mode 100644
index 000..069fcf1
--- /dev/null
+++ b/drivers/net/ethernet/freescale/dpaa/dpaa_ethtool.c
@@ -0,0 +1,230 @@
+/* Copyright 2008-2015 Freescale Semiconductor, Inc.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ * * Redistributions of source code must retain the above copyright
+ *  notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *  notice, this list of conditions and the following disclaimer in the
+ *  documentation and/or other materials provided with the distribution.
+ * * Neither the name of Freescale Semiconductor nor the
+ *  names of its contributors may be used to endorse or promote products
+ *  derived from this software without specific prior written permission.
+ *
+ *
+ * ALTERNATIVELY, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") as published by the Free Software
+ * Foundation, either version 2 of that License or (at your option) any
+ * later version.
+ *
+ * THIS SOFTWARE IS PROVIDED BY Freescale Semiconductor ``AS IS'' AND ANY
+ * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+ * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+ * DISCLAIMED. IN NO EVENT SHALL Freescale Semiconductor BE LIABLE FOR ANY
+ * DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+ * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+ * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
+ * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF 
THIS
+ * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include 
+
+#include "dpaa_eth.h"
+#include "mac.h"
+#include "dpaa_eth_common.h"
+
+static int dpa_get_settings(struct net_device *net_dev,
+   struct ethtool_cmd *et_cmd)
+{
+   int err;
+   struct dpa_priv_s *priv;
+
+   priv = netdev_priv(net_dev);
+
+   if (!priv->mac_dev->phy_dev) {
+   netdev_dbg(net_dev, "phy device not initialized\n");
+   return 0;
+   }
+
+   err = phy_ethtool_gset(priv->mac_dev->phy_dev, et_cmd);
+
+   return err;
+}
+
+static int dpa_set_settings(struct net_device *net_dev,
+   struct ethtool_cmd *et_cmd)
+{
+   int err;
+   struct dpa_priv_s *priv;
+
+   priv = netdev_priv(net_dev);
+

[v2 2/9] dpaa_eth: add support for DPAA Ethernet

2015-08-05 Thread Madalin Bucur

This introduces the Freescale Data Path Acceleration Architecture
(DPAA) Ethernet driver (dpaa_eth) that builds upon the DPAA QMan,
BMan, PAMU and FMan drivers to deliver Ethernet connectivity on
the Freescale DPAA QorIQ platforms.

Signed-off-by: Madalin Bucur 
---
 drivers/net/ethernet/freescale/Kconfig |2 +
 drivers/net/ethernet/freescale/Makefile|1 +
 drivers/net/ethernet/freescale/dpaa/Kconfig|   46 +
 drivers/net/ethernet/freescale/dpaa/Makefile   |   13 +
 drivers/net/ethernet/freescale/dpaa/dpaa_eth.c |  814 +
 drivers/net/ethernet/freescale/dpaa/dpaa_eth.h |  442 +++
 .../net/ethernet/freescale/dpaa/dpaa_eth_common.c  | 1248 
 .../net/ethernet/freescale/dpaa/dpaa_eth_common.h  |  118 ++
 drivers/net/ethernet/freescale/dpaa/dpaa_eth_sg.c  |  406 +++
 9 files changed, 3090 insertions(+)
 create mode 100644 drivers/net/ethernet/freescale/dpaa/Kconfig
 create mode 100644 drivers/net/ethernet/freescale/dpaa/Makefile
 create mode 100644 drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
 create mode 100644 drivers/net/ethernet/freescale/dpaa/dpaa_eth.h
 create mode 100644 drivers/net/ethernet/freescale/dpaa/dpaa_eth_common.c
 create mode 100644 drivers/net/ethernet/freescale/dpaa/dpaa_eth_common.h
 create mode 100644 drivers/net/ethernet/freescale/dpaa/dpaa_eth_sg.c

diff --git a/drivers/net/ethernet/freescale/Kconfig 
b/drivers/net/ethernet/freescale/Kconfig
index f3f89cc..92198be 100644
--- a/drivers/net/ethernet/freescale/Kconfig
+++ b/drivers/net/ethernet/freescale/Kconfig
@@ -92,4 +92,6 @@ config GIANFAR
  and MPC86xx family of chips, the eTSEC on LS1021A and the FEC
  on the 8540.
 
+source "drivers/net/ethernet/freescale/dpaa/Kconfig"
+
 endif # NET_VENDOR_FREESCALE
diff --git a/drivers/net/ethernet/freescale/Makefile 
b/drivers/net/ethernet/freescale/Makefile
index 4097c58..ae13dc5 100644
--- a/drivers/net/ethernet/freescale/Makefile
+++ b/drivers/net/ethernet/freescale/Makefile
@@ -12,6 +12,7 @@ obj-$(CONFIG_FS_ENET) += fs_enet/
 obj-$(CONFIG_FSL_PQ_MDIO) += fsl_pq_mdio.o
 obj-$(CONFIG_FSL_XGMAC_MDIO) += xgmac_mdio.o
 obj-$(CONFIG_GIANFAR) += gianfar_driver.o
+obj-$(CONFIG_FSL_DPAA_ETH) += dpaa/
 obj-$(CONFIG_PTP_1588_CLOCK_GIANFAR) += gianfar_ptp.o
 gianfar_driver-objs := gianfar.o \
gianfar_ethtool.o
diff --git a/drivers/net/ethernet/freescale/dpaa/Kconfig 
b/drivers/net/ethernet/freescale/dpaa/Kconfig
new file mode 100644
index 000..1f3a203
--- /dev/null
+++ b/drivers/net/ethernet/freescale/dpaa/Kconfig
@@ -0,0 +1,46 @@
+menuconfig FSL_DPAA_ETH
+   tristate "DPAA Ethernet"
+   depends on FSL_SOC && FSL_BMAN && FSL_QMAN && FSL_FMAN
+   select PHYLIB
+   select FSL_FMAN_MAC
+   ---help---
+ Data Path Acceleration Architecture Ethernet driver,
+ supporting the Freescale QorIQ chips.
+ Depends on Freescale Buffer Manager and Queue Manager
+ driver and Frame Manager Driver.
+
+if FSL_DPAA_ETH
+
+config FSL_DPAA_CS_THRESHOLD_1G
+   hex "Egress congestion threshold on 1G ports"
+   range 0x1000 0x1000
+   default "0x0600"
+   ---help---
+ The size in bytes of the egress Congestion State notification 
threshold on 1G ports.
+ The 1G dTSECs can quite easily be flooded by cores doing Tx in a 
tight loop
+ (e.g. by sending UDP datagrams at "while(1) speed"),
+ and the larger the frame size, the more acute the problem.
+ So we have to find a balance between these factors:
+  - avoiding the device staying congested for a prolonged time 
(risking
+ the netdev watchdog to fire - see also the tx_timeout module 
param);
+   - affecting performance of protocols such as TCP, which 
otherwise
+behave well under the congestion notification mechanism;
+  - preventing the Tx cores from tightly-looping (as if the 
congestion
+threshold was too low to be effective);
+  - running out of memory if the CS threshold is set too high.
+
+config FSL_DPAA_CS_THRESHOLD_10G
+   hex "Egress congestion threshold on 10G ports"
+   range 0x1000 0x2000
+   default "0x1000"
+   ---help ---
+ The size in bytes of the egress Congestion State notification 
threshold on 10G ports.
+
+config FSL_DPAA_INGRESS_CS_THRESHOLD
+   hex "Ingress congestion threshold on FMan ports"
+   default "0x1000"
+   ---help---
+ The size in bytes of the ingress tail-drop threshold on FMan ports.
+ Traffic piling up above this value will be rejected by QMan and 
discarded by FMan.
+
+endif # FSL_DPAA_ETH
diff --git a/drivers/net/ethernet/freescale/dpaa/Makefile 
b/drivers/net/ethernet/freescale/dpaa/Makefile
new file mode 100644
index 000..cf126dd
--- /dev/null
+++ b/drivers/net/ethernet/freescale/dpaa/Makefile
@@ -0,0 +1,13 @@
+#
+# Makefile for the Freescale

[v2 9/9] dpaa_eth: add trace points

2015-08-05 Thread Madalin Bucur

Add trace points on the hot processing path.

Signed-off-by: Ruxandra Ioana Radulescu 
---
 drivers/net/ethernet/freescale/dpaa/Makefile   |   1 +
 drivers/net/ethernet/freescale/dpaa/dpaa_eth.c |  12 ++
 drivers/net/ethernet/freescale/dpaa/dpaa_eth.h |   4 +
 .../net/ethernet/freescale/dpaa/dpaa_eth_trace.h   | 141 +
 4 files changed, 158 insertions(+)
 create mode 100644 drivers/net/ethernet/freescale/dpaa/dpaa_eth_trace.h

diff --git a/drivers/net/ethernet/freescale/dpaa/Makefile 
b/drivers/net/ethernet/freescale/dpaa/Makefile
index 3427de4..bf7248a 100644
--- a/drivers/net/ethernet/freescale/dpaa/Makefile
+++ b/drivers/net/ethernet/freescale/dpaa/Makefile
@@ -11,6 +11,7 @@ ccflags-y += -I$(FMAN)/flib
 obj-$(CONFIG_FSL_DPAA_ETH) += fsl_dpa.o
 
 fsl_dpa-objs += dpaa_eth.o dpaa_eth_sg.o dpaa_eth_common.o dpaa_ethtool.o 
dpaa_eth_sysfs.o
+CFLAGS_dpaa_eth.o := -I$(src)
 ifeq ($(CONFIG_FSL_DPAA_ETH_DEBUGFS),y)
 fsl_dpa-objs += dpaa_debugfs.o
 endif
diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c 
b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
index ea25bf1..7f2413b 100644
--- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
+++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
@@ -61,6 +61,12 @@
 #include "dpaa_debugfs.h"
 #endif /* CONFIG_FSL_DPAA_ETH_DEBUGFS */
 
+/* CREATE_TRACE_POINTS only needs to be defined once. Other dpa files
+ * using trace events only need to #include 
+ */
+#define CREATE_TRACE_POINTS
+#include "dpaa_eth_trace.h"
+
 #define DPA_NAPI_WEIGHT64
 
 /* Valid checksum indication */
@@ -226,6 +232,9 @@ priv_rx_default_dqrr(struct qman_portal *portal,
priv = netdev_priv(net_dev);
dpa_bp = priv->dpa_bp;
 
+   /* Trace the Rx fd */
+   trace_dpa_rx_fd(net_dev, fq, &dq->fd);
+
/* IRQ handler, non-migratable; safe to use raw_cpu_ptr here */
percpu_priv = raw_cpu_ptr(priv->percpu_priv);
count_ptr = raw_cpu_ptr(dpa_bp->percpu_count);
@@ -282,6 +291,9 @@ priv_tx_conf_default_dqrr(struct qman_portal *portal,
net_dev = ((struct dpa_fq *)fq)->net_dev;
priv = netdev_priv(net_dev);
 
+   /* Trace the fd */
+   trace_dpa_tx_conf_fd(net_dev, fq, &dq->fd);
+
/* Non-migratable context, safe to use raw_cpu_ptr */
percpu_priv = raw_cpu_ptr(priv->percpu_priv);
 
diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.h 
b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.h
index c66140e..4ac917a 100644
--- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.h
+++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.h
@@ -36,6 +36,7 @@
 
 #include "fm_ext.h"
 #include "mac.h"
+#include "dpaa_eth_trace.h"
 
 extern int dpa_rx_extra_headroom;
 extern int dpa_max_frm;
@@ -417,6 +418,9 @@ static inline int dpa_xmit(struct dpa_priv_s *priv,
_dpa_get_tx_conf_queue(priv, egress_fq)
);
 
+   /* Trace this Tx fd */
+   trace_dpa_tx_fd(priv->net_dev, egress_fq, fd);
+
for (i = 0; i < 10; i++) {
err = qman_enqueue(egress_fq, fd, 0);
if (err != -EBUSY)
diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth_trace.h 
b/drivers/net/ethernet/freescale/dpaa/dpaa_eth_trace.h
new file mode 100644
index 000..3b67477
--- /dev/null
+++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth_trace.h
@@ -0,0 +1,141 @@
+/* Copyright 2013-2015 Freescale Semiconductor Inc.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ * * Redistributions of source code must retain the above copyright
+ *  notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *  notice, this list of conditions and the following disclaimer in the
+ *  documentation and/or other materials provided with the distribution.
+ * * Neither the name of Freescale Semiconductor nor the
+ *  names of its contributors may be used to endorse or promote products
+ *  derived from this software without specific prior written permission.
+ *
+ *
+ * ALTERNATIVELY, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") as published by the Free Software
+ * Foundation, either version 2 of that License or (at your option) any
+ * later version.
+ *
+ * THIS SOFTWARE IS PROVIDED BY Freescale Semiconductor ``AS IS'' AND ANY
+ * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+ * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+ * DISCLAIMED. IN NO EVENT SHALL Freescale Semiconductor BE LIABLE FOR ANY
+ * DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+ * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+ * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
+

[PATCH net-next] r8169:Issues on alloc memory

2015-08-05 Thread Corcodel Marian

  Many, many issues DESC_ARRAY represent number of descriptor 
 on array on Tx and Rx and is fit with TxDesc and RxDesc  structure,
 MAX_DESCRIPTORS is 1024 on Rx and Tx wich is  included 256 Descriptors from
 chip on Rx and Tx.  DESC_ARRAY * NUM_ARRAYS_MAX must fit with MAX_DESCRIPTORS
  256 from chip and rest from memory.  DESC_ARRAY * NUM_ARRAY_MIN is hardware
 descriptors from chip.  On doc RTL 8101/8102 and RTL 8169 report  same number
 of descriptors  1024.

Signed-off-by: Corcodel Marian 

diff --git a/drivers/net/ethernet/realtek/r8169.c 
b/drivers/net/ethernet/realtek/r8169.c
index bf78f94..8bf8c3f 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -74,7 +74,7 @@
(NETIF_MSG_DRV | NETIF_MSG_PROBE | NETIF_MSG_IFUP | NETIF_MSG_IFDOWN)
 
 #define TX_SLOTS_AVAIL(tp) \
-   (tp->dirty_tx + NUM_TX_DESC - tp->cur_tx)
+   (tp->dirty_tx + NUM_ARRAYS_MAX - tp->cur_tx)
 
 /* A skbuff with nr_frags needs nr_frags+1 entries in the tx queue */
 #define TX_FRAGS_READY_FOR(tp,nr_frags) \
@@ -87,13 +87,17 @@ static const int multicast_filter_limit = 32;
 #define MAX_READ_REQUEST_SHIFT 12
 #define TX_DMA_BURST   7   /* Maximum PCI burst, '7' is unlimited */
 #define InterFrameGap  0x03/* 3 means InterFrameGap = the shortest one */
+#define DESC_ARRAY 16  /* Number of descriptors on array to Rx and Tx 
*/
+#define NUM_ARRAYS_MAX 64  /* Number of arrays descriptors maximum on Rx 
and Tx */
+#define NUM_ARRAYS_MIN 16  /* Number of arrays descriptors minimum on Rx 
and Tx */
+#define MAX_DESCRIPTORS1024/* Number of descriptors total and 
maximum on Rx and Tx */
 
 #define R8169_REGS_SIZE256
 #define R8169_NAPI_WEIGHT  64
 #define NUM_TX_DESC64  /* Number of Tx descriptor registers */
 #define NUM_RX_DESC256U/* Number of Rx descriptor registers */
-#define R8169_TX_RING_BYTES(NUM_TX_DESC * sizeof(struct TxDesc))
-#define R8169_RX_RING_BYTES(NUM_RX_DESC * sizeof(struct RxDesc))
+#define R8169_TX_RING_BYTES(NUM_ARRAYS_MAX * sizeof(struct TxDesc)) /* 
here sizeof not reporting correct */
+#define R8169_RX_RING_BYTES(NUM_ARRAYS_MAX * sizeof(struct RxDesc)) /* 
here sizeof not reporting correct */
 
 #define RTL8169_TX_TIMEOUT (6*HZ)
 #define RTL8169_PHY_TIMEOUT(10*HZ)
@@ -778,8 +782,8 @@ struct rtl8169_private {
struct RxDesc *RxDescArray; /* 256-aligned Rx descriptor ring */
dma_addr_t TxPhyAddr;
dma_addr_t RxPhyAddr;
-   void *Rx_databuff[NUM_RX_DESC]; /* Rx data buffers */
-   struct ring_info tx_skb[NUM_TX_DESC];   /* Tx data buffers */
+   void *Rx_databuff[NUM_ARRAYS_MAX];  /* Rx data buffers */
+   struct ring_info tx_skb[NUM_ARRAYS_MAX];/* Tx data buffers */
struct timer_list timer;
u16 cp_cmd;
 
@@ -6679,7 +6683,7 @@ static void rtl8169_rx_clear(struct rtl8169_private *tp)
 {
unsigned int i;
 
-   for (i = 0; i < NUM_RX_DESC; i++) {
+   for (i = 0; i < NUM_ARRAYS_MAX; i++) {
if (tp->Rx_databuff[i]) {
rtl8169_free_rx_databuff(tp, tp->Rx_databuff + i,
tp->RxDescArray + i);
@@ -6696,7 +6700,7 @@ static int rtl8169_rx_fill(struct rtl8169_private *tp)
 {
unsigned int i;
 
-   for (i = 0; i < NUM_RX_DESC; i++) {
+   for (i = 0; i < NUM_ARRAYS_MAX; i++) {
void *data;
 
if (tp->Rx_databuff[i])
@@ -6710,7 +6714,7 @@ static int rtl8169_rx_fill(struct rtl8169_private *tp)
tp->Rx_databuff[i] = data;
}
 
-   rtl8169_mark_as_last_descriptor(tp->RxDescArray + NUM_RX_DESC - 1);
+   rtl8169_mark_as_last_descriptor(tp->RxDescArray + NUM_ARRAYS_MAX - 1);
return 0;
 
 err_out:
@@ -6724,8 +6728,8 @@ static int rtl8169_init_ring(struct net_device *dev)
 
rtl8169_init_ring_indexes(tp);
 
-   memset(tp->tx_skb, 0x0, NUM_TX_DESC * sizeof(struct ring_info));
-   memset(tp->Rx_databuff, 0x0, NUM_RX_DESC * sizeof(void *));
+   memset(tp->tx_skb, 0x0, DESC_ARRAY * NUM_ARRAYS_MIN);
+   memset(tp->Rx_databuff, 0x0, DESC_ARRAY * NUM_ARRAYS_MIN);
 
return rtl8169_rx_fill(tp);
 }
@@ -6749,7 +6753,7 @@ static void rtl8169_tx_clear_range(struct rtl8169_private 
*tp, u32 start,
unsigned int i;
 
for (i = 0; i < n; i++) {
-   unsigned int entry = (start + i) % NUM_TX_DESC;
+   unsigned int entry = (start + i) % NUM_ARRAYS_MAX;
struct ring_info *tx_skb = tp->tx_skb + entry;
unsigned int len = tx_skb->len;
 
@@ -6769,7 +6773,7 @@ static void rtl8169_tx_clear_range(struct rtl8169_private 
*tp, u32 start,
 
 static void rtl8169_tx_clear(struct rtl8169_private *tp)
 {
-   rtl8169_tx_clear_range(tp, tp->dirty_tx, NUM_TX_DESC);
+   rtl8169_tx_clear_range(tp, tp->dirty_tx, NUM_ARRAYS_MAX);
tp->cur_tx = tp->dirty_tx = 0;
 }
 
@@ -6784,7

Re: [PATCH] ARCNET: fix hard_header_len limit

2015-08-05 Thread Michael Grzeschik

On Thu, Jul 30, 2015 at 11:16:36AM -0700, David Miller wrote:
> From: Michael Grzeschik 
> Date: Thu, 30 Jul 2015 15:34:36 +0200
> 
> > The commit <9c7077622dd9> ("packet: make packet_snd fail on len smaller
> > than l2 header") adds the check for minimum packet length of the used l2.
> > For arcnet the hardware header length is not the complete archdr which
> > includes hard + soft header. This patch changes the length to
> > sizeof(arc_hardware).
> > 
> > Signed-off-by: Michael Grzeschik 
> 
> The hard header len is used for other purposes as well, are you sure
> those don't get broken by this change?

Its meaning is to represent the amount of the hardware (link layer)
data of one packet.

Which other purposes do you mean?
Can you point to some code?

> Code assumes that if the data at the SKB mac pointer is taken, for
> dev->hard_header_len bytes, that is exactly the link layer header.
> And that this can be used to compare two MAC headers, copy the
> MAC header from one packet to another, etc.

The link layer size of arcnet is 4 bytes long. 1 byte source, 1 byte
dest and two offset bytes. As described by struct arc_hardware in
if_arcnet.h . The above condition is fulfilled when the mac pointer
is 0.

The following pending bytes of struct archdr have a variable meaning
depending of the used protocol and are represented by an union.
(network layer)

In the case of raw packets, the payload comes immediately after the
hard_header.

Thanks,
Michael

-- 
Pengutronix e.K.   | |
Industrial Linux Solutions | http://www.pengutronix.de/  |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0|
Amtsgericht Hildesheim, HRA 2686   | Fax:   +49-5121-206917- |
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: veths often slow to come up

2015-08-05 Thread Thadeu Lima de Souza Cascardo

On Tue, Aug 04, 2015 at 08:26:28PM -0700, Cong Wang wrote:
> (Cc'ing netdev for network issues)
> 
> On Tue, Aug 4, 2015 at 6:42 AM, Shaun Crampton
>  wrote:
> > Please CC me on any responses, thanks.
> >
> > Setting both ends of a veth to be oper UP completes very quickly but I
> > find that pings only start flowing over the veth after about a second.
> > This seems to correlate with the NO-CARRIER flag being set or the
> > interface being in "state UNKNOWN" or "state DOWN² for about a second
> > (demo script below).
> >
> > If I run the script repeatedly then sometimes it completes very quickly on
> > subsequent runs as if there¹s a hot cache somewhere.
> >
> > Could this be a bug or is there a configuration to speed this up?  Seems
> > odd that it¹s almost exactly 1s on the first run.
> >
> > Seen on these kernels:
> > * 3.13.0-57-generic #95-Ubuntu SMP Fri Jun 19 09:28:15 UTC 2015 x86_64
> > x86_64 x86_64 GNU/Linux
> > * 4.0.9-coreos #2 SMP Thu Jul 30 01:07:55 UTC 2015 x86_64 Intel(R) Xeon(R)
> > CPU @ 2.50GHz GenuineIntel GNU/Linux
> >
> > Regards,
> >
> > -Shaun
> >

Take a look at linkwatch_urgent_event at net/core/link_watch.c, and all of
link_watch.c in general. That's where the 1s delay comes from. It's designed to
prevent link message storms.

In particular, look at commit 294cc44b7e48a6e7732499eebcf409b231460d8e, which
added the urgent event.

I suspect this was designed to workaround buggy drivers/hardware, not to help
userspace handle thousands of virtual devices being created and destroyed all
the time.

Maybe virtual devices should be whitelisted here? Maybe the patch below is
stupid, because drivers may abuse it, and drivers are buggy, otherwise linkwatch
would not be needed in the first place.

Regards.
Cascardo.

> >
> > Running my test script below (Assumes veth0/1 do not already exist):
> >
> > $ sudo ./veth-test.sh
> > Time to create veth:
> >
> > real0m0.019s
> > user0m0.002s
> > sys 0m0.010s
> >
> > Time to wait for carrier:
> >
> > real0m1.005s
> > user0m0.007s
> > sys 0m0.123s
> >
> >
> >
> > # veth-test.sh
> >
> > #!/bin/bash
> > function create_veth {
> >   ip link add type veth
> >   ip link set veth0 up
> >   ip link set veth1 up
> > }
> > function wait_for_carrier {
> >   while ! ip link show | grep -qE 'veth[01]';
> >   do
> > sleep 0.05
> >   done
> >   while ip link show | grep -E 'veth[01]¹ | \
> > grep -Eq 'NO-CARRIER|state DOWN|state UNKNOWN';
> >   do
> > sleep 0.05
> >   done
> > }
> > echo "Time to create veth:"
> > time create_veth
> > echo
> > echo "Time to wait for carrier:"
> > time wait_for_carrier
> > ip link del veth0
---
diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index 343592c..91123a8 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -306,6 +306,7 @@ static void veth_setup(struct net_device *dev)
 
dev->priv_flags &= ~IFF_TX_SKB_SHARING;
dev->priv_flags |= IFF_LIVE_ADDR_CHANGE;
+   dev->priv_flags |= IFF_LINKWATCH_URGENT;
 
dev->netdev_ops = &veth_netdev_ops;
dev->ethtool_ops = &veth_ethtool_ops;
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 607b5f4..138f5e9 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1262,6 +1262,7 @@ struct net_device_ops {
  * @IFF_LIVE_ADDR_CHANGE: device supports hardware address
  * change when it's running
  * @IFF_MACVLAN: Macvlan device
+ * @IFF_LINKWATCH_URGENT: device does not flood with link updates
  */
 enum netdev_priv_flags {
IFF_802_1Q_VLAN = 1<<0,
@@ -1289,6 +1290,7 @@ enum netdev_priv_flags {
IFF_XMIT_DST_RELEASE_PERM   = 1<<22,
IFF_IPVLAN_MASTER   = 1<<23,
IFF_IPVLAN_SLAVE= 1<<24,
+   IFF_LINKWATCH_URGENT= 1<<25,
 };
 
 #define IFF_802_1Q_VLANIFF_802_1Q_VLAN
diff --git a/net/core/link_watch.c b/net/core/link_watch.c
index 9828616..e2957a0 100644
--- a/net/core/link_watch.c
+++ b/net/core/link_watch.c
@@ -95,6 +95,9 @@ static bool linkwatch_urgent_event(struct net_device *dev)
if (dev->priv_flags & IFF_TEAM_PORT)
return true;
 
+   if (dev->priv_flags & IFF_LINKWATCH_URGENT)
+   return true;
+
return netif_carrier_ok(dev) && qdisc_tx_changing(dev);
 }
--- 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 0/2] of: fsl/fman: reuse the fixed node parsing code

2015-08-05 Thread Madalin Bucur

The FMan MAC configuration code needs the speed and duplex information
for fixed-link interfaces that is parsed now by the of function
of_phy_register_fixed_link(). This parses the fixed-link parameters but
does not expose to the caller neither the phy_device pointer nor the
status struct where it loads the fixed-link params. By extracting the
fixed-link parsing code from of_phy_register_fixed_link() into a
separate function the parsed values are made available without changing
the existing API. This change also removes a small redundancy in the
previous code calling fixed_phy_register().

The FMan patch relies on the latest FMan driver v4 submission by Igal Liberman:
https://patchwork.ozlabs.org/project/netdev/list/?submitter=Igal.Liberman&state=*&q=v4

Madalin Bucur (2):
  of: separate fixed link parsing from registration
  fsl_fman: use fixed_phy_status for MEMAC

 .../ethernet/freescale/fman/flib/fsl_fman_memac.h  |  6 ++-
 drivers/net/ethernet/freescale/fman/inc/mac.h  |  2 +-
 drivers/net/ethernet/freescale/fman/mac/fm_memac.c | 42 -
 drivers/net/ethernet/freescale/fman/mac/fm_memac.h |  3 +-
 drivers/net/ethernet/freescale/fman/mac/mac.c  | 18 ++--
 drivers/of/of_mdio.c   | 52 ++
 include/linux/of_mdio.h|  9 
 7 files changed, 94 insertions(+), 38 deletions(-)

-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH RFC 2/2] fsl_fman: use fixed_phy_status for MEMAC

2015-08-05 Thread Madalin Bucur

Use the speed and duplex information from the device tree fixed link
node accessing the status structure parsed by of_phy_parse_fixed_link().

Signed-off-by: Madalin Bucur 
---
 .../ethernet/freescale/fman/flib/fsl_fman_memac.h  |  6 ++--
 drivers/net/ethernet/freescale/fman/inc/mac.h  |  2 +-
 drivers/net/ethernet/freescale/fman/mac/fm_memac.c | 42 --
 drivers/net/ethernet/freescale/fman/mac/fm_memac.h |  3 +-
 drivers/net/ethernet/freescale/fman/mac/mac.c  | 18 +++---
 5 files changed, 52 insertions(+), 19 deletions(-)

diff --git a/drivers/net/ethernet/freescale/fman/flib/fsl_fman_memac.h 
b/drivers/net/ethernet/freescale/fman/flib/fsl_fman_memac.h
index ebf7989..50bed14 100644
--- a/drivers/net/ethernet/freescale/fman/flib/fsl_fman_memac.h
+++ b/drivers/net/ethernet/freescale/fman/flib/fsl_fman_memac.h
@@ -33,8 +33,10 @@
 #define __FSL_FMAN_MEMAC_H
 
 #include 
-
+#include 
+#include 
 #include "fsl_enet.h"
+
 /* Num of additional exact match MAC adr regs */
 #define MEMAC_NUM_OF_PADDRS 7
 
@@ -373,7 +375,7 @@ struct memac_cfg {
bool tx_pbl_fwd;
bool debug_mode;
bool wake_on_lan;
-   bool fixed_link;
+   struct fixed_phy_status *fixed_link;
u16 max_frame_length;
u16 pause_quanta;
u32 tx_ipg_length;
diff --git a/drivers/net/ethernet/freescale/fman/inc/mac.h 
b/drivers/net/ethernet/freescale/fman/inc/mac.h
index f86d0bc..fbeb957 100644
--- a/drivers/net/ethernet/freescale/fman/inc/mac.h
+++ b/drivers/net/ethernet/freescale/fman/inc/mac.h
@@ -62,7 +62,7 @@ struct mac_device {
phy_interface_t  phy_if;
u32  if_support;
bool link;
-   bool fixed_link;
+   struct fixed_phy_status *fixed_link;
u16  speed;
u16  max_speed;
struct device_node  *phy_node;
diff --git a/drivers/net/ethernet/freescale/fman/mac/fm_memac.c 
b/drivers/net/ethernet/freescale/fman/mac/fm_memac.c
index becbb88..3d5ede3 100644
--- a/drivers/net/ethernet/freescale/fman/mac/fm_memac.c
+++ b/drivers/net/ethernet/freescale/fman/mac/fm_memac.c
@@ -71,9 +71,10 @@ static int memac_mii_write_phy_reg(struct memac_t *memac, u8 
phy_addr,
 }
 
 static void setup_sgmii_internal_phy(struct memac_t *memac, u8 phy_addr,
-bool fixed_link)
+struct fixed_phy_status *fixed_link)
 {
u16 tmp_reg16;
+   enum ethernet_interface enet_if;
enum e_enet_mode enet_mode;
 
/* In case the higher MACs are used (i.e. the MACs that should
@@ -81,20 +82,37 @@ static void setup_sgmii_internal_phy(struct memac_t *memac, 
u8 phy_addr,
 * Temporary modify enet mode to 1G one, so MII functions can
 * work correctly.
 */
+   enet_if = ENET_INTERFACE_FROM_MODE(memac->enet_mode);
enet_mode = memac->enet_mode;
-   memac->enet_mode =
-   MAKE_ENET_MODE(ENET_INTERFACE_FROM_MODE(memac->enet_mode),
-  ENET_SPEED_1000);
+   memac->enet_mode = MAKE_ENET_MODE(enet_if, ENET_SPEED_1000);
 
/* SGMII mode */
tmp_reg16 = PHY_SGMII_IF_MODE_SGMII;
if (!fixed_link)
/* AN enable */
tmp_reg16 |= PHY_SGMII_IF_MODE_AN;
-   else
-   /* Fixed link 1Gb FD */
-   tmp_reg16 |= PHY_SGMII_IF_MODE_SPEED_GB |
-PHY_SGMII_IF_MODE_DUPLEX_FULL;
+   else {
+   switch (fixed_link->speed) {
+   case 10:
+   tmp_reg16 |= PHY_SGMII_IF_MODE_SPEED_10M;
+   memac->enet_mode = MAKE_ENET_MODE(enet_if,
+ ENET_SPEED_10);
+   break;
+   case 100:
+   tmp_reg16 |= PHY_SGMII_IF_MODE_SPEED_100M;
+   memac->enet_mode = MAKE_ENET_MODE(enet_if,
+ ENET_SPEED_100);
+   break;
+   case 1000: /* fallthrough */
+   default:
+   tmp_reg16 |= PHY_SGMII_IF_MODE_SPEED_GB;
+   break;
+   }
+   if (fixed_link->duplex)
+   tmp_reg16 |= PHY_SGMII_IF_MODE_DUPLEX_FULL;
+   else
+   tmp_reg16 |= PHY_SGMII_IF_MODE_DUPLEX_HALF;
+   }
memac_mii_write_phy_reg(memac, phy_addr, 0x14, tmp_reg16);
 
/* Device ability according to SGMII specification */
@@ -120,6 +138,7 @@ static void setup_sgmii_internal_phy(struct memac_t *memac, 
u8 phy_addr,
/* Restart AN */
tmp_reg16 = PHY_SGMII_CR_DEF_VAL | PHY_SGMII_CR_RESET_AN;
else
+   /* AN disabled */
tmp_reg16 = PHY_SGMII_CR_DEF_VAL & ~PHY_SGMII_CR_AN_ENABLE;
memac_mii_write_phy_reg(memac, phy_addr, 0x0, tmp_reg16);
 
@@

[PATCH RFC 1/2] of: separate fixed link parsing from registration

2015-08-05 Thread Madalin Bucur

Some drivers may need to parse the fixed link values before registering
the fixed link phy or access the status values. Separate the parsing from
the actual registration and provide an export for the added parsing function.

Signed-off-by: Madalin Bucur 
---
 drivers/of/of_mdio.c| 52 +++--
 include/linux/of_mdio.h |  9 +
 2 files changed, 42 insertions(+), 19 deletions(-)

diff --git a/drivers/of/of_mdio.c b/drivers/of/of_mdio.c
index fdc60db..b7e8288 100644
--- a/drivers/of/of_mdio.c
+++ b/drivers/of/of_mdio.c
@@ -284,43 +284,57 @@ bool of_phy_is_fixed_link(struct device_node *np)
 }
 EXPORT_SYMBOL(of_phy_is_fixed_link);
 
-int of_phy_register_fixed_link(struct device_node *np)
+int of_phy_parse_fixed_link(struct device_node *np,
+   struct fixed_phy_status *status)
 {
-   struct fixed_phy_status status = {};
struct device_node *fixed_link_node;
const __be32 *fixed_link_prop;
int len;
-   struct phy_device *phy;
 
/* New binding */
fixed_link_node = of_get_child_by_name(np, "fixed-link");
if (fixed_link_node) {
-   status.link = 1;
-   status.duplex = of_property_read_bool(fixed_link_node,
- "full-duplex");
-   if (of_property_read_u32(fixed_link_node, "speed", 
&status.speed))
+   status->link = 1;
+   status->duplex = of_property_read_bool(fixed_link_node,
+  "full-duplex");
+   if (of_property_read_u32(fixed_link_node, "speed",
+&status->speed))
return -EINVAL;
-   status.pause = of_property_read_bool(fixed_link_node, "pause");
-   status.asym_pause = of_property_read_bool(fixed_link_node,
- "asym-pause");
+   status->pause = of_property_read_bool(fixed_link_node,
+ "pause");
+   status->asym_pause = of_property_read_bool(fixed_link_node,
+  "asym-pause");
of_node_put(fixed_link_node);
-   phy = fixed_phy_register(PHY_POLL, &status, np);
-   return IS_ERR(phy) ? PTR_ERR(phy) : 0;
+
+   return 0;
}
 
/* Old binding */
fixed_link_prop = of_get_property(np, "fixed-link", &len);
if (fixed_link_prop && len == (5 * sizeof(__be32))) {
-   status.link = 1;
-   status.duplex = be32_to_cpu(fixed_link_prop[1]);
-   status.speed = be32_to_cpu(fixed_link_prop[2]);
-   status.pause = be32_to_cpu(fixed_link_prop[3]);
-   status.asym_pause = be32_to_cpu(fixed_link_prop[4]);
-   phy = fixed_phy_register(PHY_POLL, &status, np);
-   return IS_ERR(phy) ? PTR_ERR(phy) : 0;
+   status->link = 1;
+   status->duplex = be32_to_cpu(fixed_link_prop[1]);
+   status->speed = be32_to_cpu(fixed_link_prop[2]);
+   status->pause = be32_to_cpu(fixed_link_prop[3]);
+   status->asym_pause = be32_to_cpu(fixed_link_prop[4]);
+
+   return 0;
}
 
return -ENODEV;
 }
+EXPORT_SYMBOL(of_phy_parse_fixed_link);
+
+int of_phy_register_fixed_link(struct device_node *np)
+{
+   struct phy_device *phy;
+   struct fixed_phy_status status = {};
+
+   if (of_phy_parse_fixed_link(np, &status))
+   return -ENODEV;
+
+   phy = fixed_phy_register(PHY_POLL, &status, np);
+   return IS_ERR(phy) ? PTR_ERR(phy) : 0;
+}
 EXPORT_SYMBOL(of_phy_register_fixed_link);
 #endif
diff --git a/include/linux/of_mdio.h b/include/linux/of_mdio.h
index 8f2237e..311b2cf 100644
--- a/include/linux/of_mdio.h
+++ b/include/linux/of_mdio.h
@@ -12,6 +12,8 @@
 #include 
 #include 
 
+struct fixed_phy_status;
+
 #ifdef CONFIG_OF
 extern int of_mdiobus_register(struct mii_bus *mdio, struct device_node *np);
 extern struct phy_device *of_phy_find_device(struct device_node *phy_np);
@@ -70,9 +72,16 @@ static inline int of_mdio_parse_addr(struct device *dev,
 #endif /* CONFIG_OF */
 
 #if defined(CONFIG_OF) && defined(CONFIG_FIXED_PHY)
+extern int of_phy_parse_fixed_link(struct device_node *np,
+  struct fixed_phy_status *status);
 extern int of_phy_register_fixed_link(struct device_node *np);
 extern bool of_phy_is_fixed_link(struct device_node *np);
 #else
+static inline int of_phy_parse_fixed_link(struct device_node *np,
+ struct fixed_phy_status *status)
+{
+   return -EINVAL;
+}
 static inline int of_phy_register_fixed_link(struct device_node *np)
 {
return -ENOSYS;
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe ne

Re: [PATCH net-next] vxlan: combine VXLAN_FLOWBASED into VXLAN_COLLECT_METADATA

2015-08-05 Thread Thomas Graf

On 08/04/15 at 10:51pm, Alexei Starovoitov wrote:
> IFLA_VXLAN_FLOWBASED is useless without IFLA_VXLAN_COLLECT_METADATA,
> so combine them into single IFLA_VXLAN_COLLECT_METADATA flag.
> 'flowbased' doesn't convey real meaning of the vxlan tunnel mode.
> This mode can be used by routing, tc+bpf and ovs.
> Only ovs is strictly flow based, so 'collect metadata' is a better
> name for this tunnel mode.
> 
> Signed-off-by: Alexei Starovoitov 

I have no objection since a flag to enable tx only can be added again if
needed. As stated in the other thread, the tx only mode which is what
VXLAN was capable of doing so far is what motivated the split of flags.

Acked-by: Thomas Graf 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next] vxlan: expose COLLECT_METADATA flag to user space

2015-08-05 Thread Thomas Graf

On 08/03/15 at 02:14pm, Jesse Gross wrote:
> On Fri, Jul 31, 2015 at 8:41 AM, Alexei Starovoitov  wrote:
> > thanks. I think exposing collect_metadata for vxlan and in the future
> > for other tunnel types is the clean enough way, though the other
> > alternative would be to get rid of collect_metadata flag
> > from the kernel and do it when flowmode flag is set. Thoughts?
> 
> This seems like a good idea to me - I'm not sure that flow based
> tunnels are all that useful without metadata collection enabled and
> the fewer interfaces that we have to create for each tunnel type, the
> better.

The use case at hand where the flag is useful if someone is only
doing flow based TX which can be handy if the tunnel id remains
unused.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 5/5] net: rfkill: gpio: remove rfkill_gpio_platform_data

2015-08-05 Thread Andy Shevchenko

On Wed, 2015-08-05 at 16:39 +0300, Heikki Krogerus wrote:
> No more users for it.
> 
> Signed-off-by: Heikki Krogerus 
> ---
>  include/linux/rfkill-gpio.h | 37 ---
> --
>  net/rfkill/Kconfig  |  3 +--
>  net/rfkill/rfkill-gpio.c|  8 
>  3 files changed, 1 insertion(+), 47 deletions(-)
>  delete mode 100644 include/linux/rfkill-gpio.h
> 
> diff --git a/include/linux/rfkill-gpio.h b/include/linux/rfkill
> -gpio.h
> deleted file mode 100644
> index 20bcb55..000
> --- a/include/linux/rfkill-gpio.h
> +++ /dev/null
> @@ -1,37 +0,0 @@
> -/*
> - * Copyright (c) 2011, NVIDIA Corporation.
> - *
> - * This program is free software; you can redistribute it and/or 
> modify
> - * it under the terms of the GNU General Public License as published 
> by
> - * the Free Software Foundation; either version 2 of the License, or
> - * (at your option) any later version.
> - *
> - * This program is distributed in the hope that it will be useful, 
> but WITHOUT
> - * ANY WARRANTY; without even the implied warranty of 
> MERCHANTABILITY or
> - * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public 
> License for
> - * more details.
> - *
> - * You should have received a copy of the GNU General Public License 
> along
> - * with this program; if not, write to the Free Software Foundation, 
> Inc.,
> - * 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
> - */
> -
> -
> -#ifndef __RFKILL_GPIO_H
> -#define __RFKILL_GPIO_H
> -
> -#include 
> -#include 
> -
> -/**
> - * struct rfkill_gpio_platform_data - platform data for rfkill gpio 
> device.
> - * for unused gpio's, the expected value is -1.
> - * @name:name for the gpio rf kill instance
> - */
> -
> -struct rfkill_gpio_platform_data {
> - char*name;
> - enum rfkill_typetype;
> -};
> -
> -#endif /* __RFKILL_GPIO_H */
> diff --git a/net/rfkill/Kconfig b/net/rfkill/Kconfig
> index 4c10e7e..6320890 100644
> --- a/net/rfkill/Kconfig
> +++ b/net/rfkill/Kconfig
> @@ -40,5 +40,4 @@ config RFKILL_GPIO
>   default n
>   help
> If you say yes here you get support of a generic gpio 
> RFKILL
> -   driver. The platform should fill in the appropriate fields 
> in the
> -   rfkill_gpio_platform_data structure and pass that to the 
> driver.
> +   driver.
> diff --git a/net/rfkill/rfkill-gpio.c b/net/rfkill/rfkill-gpio.c
> index 07323c3..69d92e1 100644
> --- a/net/rfkill/rfkill-gpio.c
> +++ b/net/rfkill/rfkill-gpio.c
> @@ -27,8 +27,6 @@
>  #include 
>  #include 
>  
> -#include 
> -
>  struct rfkill_gpio_data {
>   const char  *name;
>   enum rfkill_typetype;
> @@ -89,7 +87,6 @@ static int rfkill_gpio_acpi_probe(struct device 
> *dev,
>  
>  static int rfkill_gpio_probe(struct platform_device *pdev)
>  {
> - struct rfkill_gpio_platform_data *pdata = pdev
> ->dev.platform_data;
>   struct rfkill_gpio_data *rfkill;
>   struct gpio_desc *gpio;
>   const char *type_name;
> @@ -111,11 +108,6 @@ static int rfkill_gpio_probe(struct 
> platform_device *pdev)
>   ret = rfkill_gpio_acpi_probe(&pdev->dev, rfkill);
>   if (ret)
>   return ret;
> - } else if (pdata) {
> - rfkill->name = pdata->name;
> - rfkill->type = pdata->type;
> - } else {
> - return -ENODEV;

Shouldn't we leave the error path and modify to check if we have device
property set set?

>   }
>  
>   rfkill->clk = devm_clk_get(&pdev->dev, NULL);

-- 
Andy Shevchenko 
Intel Finland Oy
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

1 2 >

1 - 100 of 162 matches

Mail list logo