from:"roopa"

[PATCH net-next v2 0/2] rtnetlink: new message for stats

2016-04-08 Thread Roopa Prabhu

From: Roopa Prabhu 

This patch adds a new RTM_GETSTATS message to query link stats via
netlink from the kernel. RTM_NEWLINK also dumps stats today, but
RTM_NEWLINK returns a lot more than just stats and is expensive in some
cases when frequent polling for stats from userspace is a common operation.

RTM_GETSTATS is an attempt to provide a light weight netlink message
to explicity query only link stats from the kernel on an interface.
The idea is to also keep it extensible so that new kinds of stats can be
added to it in the future.


Roopa Prabhu (2):
  rtnetlink: add new RTM_GETSTATS to dump link stats
  ipv6: add support for stats via RTM_GETSTATS

Suggested-by: Jamal Hadi Salim 
Signed-off-by: Roopa Prabhu 

RFC to v1 (apologies for the delay in sending this version out. busy days):
- Addressed feedback from Dave
- removed rtnl_link_stats
- Added hdr struct if_stats_msg to carry ifindex and
  filter mask
- new macro IFLA_STATS_FILTER_BIT(ATTR) for filter mask
- split the ipv6 patch into a separate patch, need some more eyes on it
- prefix attributes with IFLA_STATS instead of IFLA_LINK_STATS for 
shorter
  attribute names

v1 - v2:
- move IFLA_STATS_INET6 declaration to the inet6 patch
- get rid of RTM_DELSTATS
- mark ipv6 patch RFC. It can be used as an example for
  other AF stats like stats


 include/net/rtnetlink.h|   5 +
 include/uapi/linux/if_link.h   |  19 
 include/uapi/linux/rtnetlink.h |   7 ++
 net/core/rtnetlink.c   | 201 +
 net/ipv6/addrconf.c|  77 ++--
 5 files changed, 301 insertions(+), 8 deletions(-)

-- 
1.9.1

[PATCH net-next v2 1/2] rtnetlink: add new RTM_GETSTATS message to dump link stats

2016-04-08 Thread Roopa Prabhu

From: Roopa Prabhu 

This patch adds a new RTM_GETSTATS message to query link stats via netlink
from the kernel. RTM_NEWLINK also dumps stats today, but RTM_NEWLINK
returns a lot more than just stats and is expensive in some cases when
frequent polling for stats from userspace is a common operation.

RTM_GETSTATS is an attempt to provide a light weight netlink message
to explicity query only link stats from the kernel on an interface.
The idea is to also keep it extensible so that new kinds of stats can be
added to it in the future.

This patch adds the following attribute for NETDEV stats:
struct nla_policy ifla_stats_policy[IFLA_STATS_MAX + 1] = {
[IFLA_STATS_LINK64]  = { .len = sizeof(struct rtnl_link_stats64) },
};

This patch also allows for af family stats (an example af stats for IPV6
is available with the second patch in the series).

Like any other rtnetlink message, RTM_GETSTATS can be used to get stats of
a single interface or all interfaces with NLM_F_DUMP.

Future possible new types of stat attributes:
- IFLA_MPLS_STATS  (nested. for mpls/mdev stats)
- IFLA_EXTENDED_STATS (nested. extended software netdev stats like bridge,
  vlan, vxlan etc)
- IFLA_EXTENDED_HW_STATS (nested. extended hardware stats which are
  available via ethtool today)

This patch also declares a filter mask for all stat attributes.
User has to provide a mask of stats attributes to query. This will be
specified in a new hdr 'struct if_stats_msg' for stats messages.

Without any attributes in the filter_mask, no stats will be returned.

This patch has been tested with mofified iproute2 ifstat.

Suggested-by: Jamal Hadi Salim 
Signed-off-by: Roopa Prabhu 
---
 include/net/rtnetlink.h|   5 ++
 include/uapi/linux/if_link.h   |  18 
 include/uapi/linux/rtnetlink.h |   5 ++
 net/core/rtnetlink.c   | 200 +
 4 files changed, 228 insertions(+)

diff --git a/include/net/rtnetlink.h b/include/net/rtnetlink.h
index 2f87c1b..fa68158 100644
--- a/include/net/rtnetlink.h
+++ b/include/net/rtnetlink.h
@@ -131,6 +131,11 @@ struct rtnl_af_ops {
const struct nlattr *attr);
int (*set_link_af)(struct net_device *dev,
   const struct nlattr *attr);
+   size_t  (*get_link_af_stats_size)(const struct 
net_device *dev,
+ u32 filter_mask);
+   int (*fill_link_af_stats)(struct sk_buff *skb,
+ const struct net_device 
*dev,
+ u32 filter_mask);
 };
 
 void __rtnl_af_unregister(struct rtnl_af_ops *ops);
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index 9427f17..4cfd029 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -780,4 +780,22 @@ enum {
 
 #define IFLA_HSR_MAX (__IFLA_HSR_MAX - 1)
 
+/* STATS section */
+
+struct if_stats_msg {
+   __u8  family;
+   __u32 ifindex;
+   __u32 filter_mask;
+};
+
+enum {
+   IFLA_STATS_UNSPEC,
+   IFLA_STATS_LINK64,
+   __IFLA_STATS_MAX,
+};
+
+#define IFLA_STATS_MAX (__IFLA_STATS_MAX - 1)
+
+#define IFLA_STATS_FILTER_BIT(ATTR)(1 << (ATTR))
+
 #endif /* _UAPI_LINUX_IF_LINK_H */
diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
index ca764b5..cc885c4 100644
--- a/include/uapi/linux/rtnetlink.h
+++ b/include/uapi/linux/rtnetlink.h
@@ -139,6 +139,11 @@ enum {
RTM_GETNSID = 90,
 #define RTM_GETNSID RTM_GETNSID
 
+   RTM_NEWSTATS = 92,
+#define RTM_NEWSTATS RTM_NEWSTATS
+   RTM_GETSTATS = 94,
+#define RTM_GETSTATS RTM_GETSTATS
+
__RTM_MAX,
 #define RTM_MAX(((__RTM_MAX + 3) & ~3) - 1)
 };
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index a75f7e9..d1fba58 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -3451,6 +3451,203 @@ out:
return err;
 }
 
+static int rtnl_fill_statsinfo(struct sk_buff *skb, struct net_device *dev,
+  int type, u32 pid, u32 seq, u32 change,
+  unsigned int flags, unsigned int filter_mask)
+{
+   const struct rtnl_link_stats64 *stats;
+   struct rtnl_link_stats64 temp;
+   struct if_stats_msg *ifsm;
+   struct nlmsghdr *nlh;
+   struct rtnl_af_ops *af_ops;
+   struct nlattr *attr;
+
+   ASSERT_RTNL();
+
+   nlh = nlmsg_put(skb, pid, seq, type, sizeof(*ifsm), flags);
+   if (!nlh)
+   return -EMSGSIZE;
+
+   ifsm = nlmsg_data(nlh);
+   ifsm->ifindex = dev->ifindex;
+   ifsm->filter_mask = filter_mask;
+
+   if (filter_mask & IFLA_STATS_FILTER_BIT(IFLA_STATS_LINK64)) {
+   attr = nla_reserve(skb, IFLA_STATS_LINK64,
+

[PATCH net-next 0/2] rtnetlink: new message for stats

2016-03-12 Thread Roopa Prabhu

From: Roopa Prabhu 

This patch adds a new RTM_GETSTATS message to query link stats via
netlink from the kernel. RTM_NEWLINK also dumps stats today, but
RTM_NEWLINK returns a lot more than just stats and is expensive in some
cases when frequent polling for stats from userspace is a common operation.

RTM_GETSTATS is an attempt to provide a light weight netlink message
to explicity query only link stats from the kernel on an interface.
The idea is to also keep it extensible so that new kinds of stats can be
added to it in the future.


Roopa Prabhu (2):
  rtnetlink: add new RTM_GETSTATS to dump link stats
  ipv6: add support for stats via RTM_GETSTATS

Suggested-by: Jamal Hadi Salim 
Signed-off-by: Roopa Prabhu 

RFC to v1 (apologies for the delay in sending this version out. busy days):
- Addressed feedback from Dave
- removed rtnl_link_stats
- Added hdr struct if_stats_msg to carry ifindex and
  filter mask
- new macro IFLA_STATS_FILTER_BIT(ATTR) for filter mask
- split the ipv6 patch into a separate patch, need some more eyes on it
- prefix attributes with IFLA_STATS instead of IFLA_LINK_STATS for 
shorter
  attribute names


 include/net/rtnetlink.h|   5 +
 include/uapi/linux/if_link.h   |  19 
 include/uapi/linux/rtnetlink.h |   7 ++
 net/core/rtnetlink.c   | 201 +
 net/ipv6/addrconf.c|  77 ++--
 5 files changed, 301 insertions(+), 8 deletions(-)

-- 
1.9.1

[PATCH net-next 1/2] rtnetlink: add new RTM_GETSTATS message to dump link stats

2016-03-12 Thread Roopa Prabhu

From: Roopa Prabhu 

This patch adds a new RTM_GETSTATS message to query link stats via netlink
from the kernel. RTM_NEWLINK also dumps stats today, but RTM_NEWLINK
returns a lot more than just stats and is expensive in some cases when
frequent polling for stats from userspace is a common operation.

RTM_GETSTATS is an attempt to provide a light weight netlink message
to explicity query only link stats from the kernel on an interface.
The idea is to also keep it extensible so that new kinds of stats can be
added to it in the future.

This patch adds the following attribute for NETDEV stats:
struct nla_policy ifla_stats_policy[IFLA_STATS_MAX + 1] = {
[IFLA_STATS_LINK64]  = { .len = sizeof(struct rtnl_link_stats64) },
};

This patch also allows for af family stats (an example af stats for IPV6
is available with the second patch in the series).

Like any other rtnetlink message, RTM_GETSTATS can be used to get stats of
a single interface or all interfaces with NLM_F_DUMP.

Future possible new types of stat attributes:
- IFLA_MPLS_STATS  (nested. for mpls/mdev stats)
- IFLA_EXTENDED_STATS (nested. extended software netdev stats like bridge,
  vlan, vxlan etc)
- IFLA_EXTENDED_HW_STATS (nested. extended hardware stats which are
  available via ethtool today)

This patch also declares a filter mask for all stat attributes.
User has to provide a mask of stats attributes to query. This will be
specified in a new hdr 'struct if_stats_msg' for stats messages.

Without any attributes in the filter_mask, no stats will be returned.

This patch has been tested with modified iproute2 ifstat.

Suggested-by: Jamal Hadi Salim 
Signed-off-by: Roopa Prabhu 
---
 include/net/rtnetlink.h|   5 ++
 include/uapi/linux/if_link.h   |  19 
 include/uapi/linux/rtnetlink.h |   7 ++
 net/core/rtnetlink.c   | 200 +
 4 files changed, 231 insertions(+)

diff --git a/include/net/rtnetlink.h b/include/net/rtnetlink.h
index 2f87c1b..fa68158 100644
--- a/include/net/rtnetlink.h
+++ b/include/net/rtnetlink.h
@@ -131,6 +131,11 @@ struct rtnl_af_ops {
const struct nlattr *attr);
int (*set_link_af)(struct net_device *dev,
   const struct nlattr *attr);
+   size_t  (*get_link_af_stats_size)(const struct 
net_device *dev,
+ u32 filter_mask);
+   int (*fill_link_af_stats)(struct sk_buff *skb,
+ const struct net_device 
*dev,
+ u32 filter_mask);
 };
 
 void __rtnl_af_unregister(struct rtnl_af_ops *ops);
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index 249eef9..0840f3e 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -741,4 +741,23 @@ enum {
 
 #define IFLA_HSR_MAX (__IFLA_HSR_MAX - 1)
 
+/* STATS section */
+
+struct if_stats_msg {
+   __u8  family;
+   __u32 ifindex;
+   __u32 filter_mask;
+};
+
+enum {
+   IFLA_STATS_UNSPEC,
+   IFLA_STATS_LINK64,
+   IFLA_STATS_INET6,
+   __IFLA_STATS_MAX,
+};
+
+#define IFLA_STATS_MAX (__IFLA_STATS_MAX - 1)
+
+#define IFLA_STATS_FILTER_BIT(ATTR)(1 << (ATTR))
+
 #endif /* _UAPI_LINUX_IF_LINK_H */
diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
index ca764b5..2bbb300 100644
--- a/include/uapi/linux/rtnetlink.h
+++ b/include/uapi/linux/rtnetlink.h
@@ -139,6 +139,13 @@ enum {
RTM_GETNSID = 90,
 #define RTM_GETNSID RTM_GETNSID
 
+   RTM_NEWSTATS = 92,
+#define RTM_NEWSTATS RTM_NEWSTATS
+   RTM_DELSTATS = 93,
+#define RTM_DELSTATS RTM_DELSTATS
+   RTM_GETSTATS = 94,
+#define RTM_GETSTATS RTM_GETSTATS
+
__RTM_MAX,
 #define RTM_MAX(((__RTM_MAX + 3) & ~3) - 1)
 };
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index d2d9e5e..d1e3d17 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -3410,6 +3410,203 @@ out:
return err;
 }
 
+static int rtnl_fill_statsinfo(struct sk_buff *skb, struct net_device *dev,
+  int type, u32 pid, u32 seq, u32 change,
+  unsigned int flags, unsigned int filter_mask)
+{
+   const struct rtnl_link_stats64 *stats;
+   struct rtnl_link_stats64 temp;
+   struct if_stats_msg *ifsm;
+   struct nlmsghdr *nlh;
+   struct rtnl_af_ops *af_ops;
+   struct nlattr *attr;
+
+   ASSERT_RTNL();
+
+   nlh = nlmsg_put(skb, pid, seq, type, sizeof(*ifsm), flags);
+   if (!nlh)
+   return -EMSGSIZE;
+
+   ifsm = nlmsg_data(nlh);
+   ifsm->ifindex = dev->ifindex;
+   ifsm->filter_mask = filter_mask;
+
+   if (filter_mask & IFLA_STATS_FILTER_BIT(IFLA_STATS_LINK64)) {
+

[PATCH net-next 2/2] ipv6: add support for stats via RTM_GETSTATS

2016-03-12 Thread Roopa Prabhu

From: Roopa Prabhu 

This patch is an example of adding af stats in
RTM_GETSTATS. It adds a new nested IFLA_STATS_INET6
attribute for ipv6 af stats. stats attributes inside
IFLA_STATS_INET6 nested attribute use the existing ipv6 stats
attributes from ipv6 IFLA_PROTINFO (I can certainly declare
new attributes if required)

Signed-off-by: Roopa Prabhu 
---
I have added this patch only to show an example of af stats.
I have tested it to work. My real intent is to have
IFLA_STATS_MPLS implemented in the same way for mpls.
We could rethink ipv6 stats in a new way instead of carrying
over the older ipv6 stats which i am doing here.
I am not sure how popular the current ipv6 stats are.
I have not used them. Adding this note here if people prefer
dropping this and revisiting ipv6 stats at a later point.

 net/core/rtnetlink.c |  1 +
 net/ipv6/addrconf.c  | 77 ++--
 2 files changed, 70 insertions(+), 8 deletions(-)

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index d1e3d17..00a04e5 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -3464,6 +3464,7 @@ nla_put_failure:
 
 static const struct nla_policy ifla_stats_policy[IFLA_STATS_MAX + 1] = {
[IFLA_STATS_LINK64] = { .len = sizeof(struct rtnl_link_stats64) },
+   [IFLA_STATS_INET6]  = {. type = NLA_NESTED },
 };
 
 static size_t rtnl_link_get_af_stats_size(const struct net_device *dev,
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 8c0dab2..d700647 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -4911,6 +4911,29 @@ static void snmp6_fill_stats(u64 *stats, struct 
inet6_dev *idev, int attrtype,
}
 }
 
+static int inet6_fill_ifla6_stats(struct sk_buff *skb,
+ struct inet6_dev *idev)
+{
+   struct nlattr *nla;
+
+   nla = nla_reserve(skb, IFLA_INET6_STATS, IPSTATS_MIB_MAX * sizeof(u64));
+   if (!nla)
+   goto nla_put_failure;
+   snmp6_fill_stats(nla_data(nla), idev, IFLA_INET6_STATS, nla_len(nla));
+
+   nla = nla_reserve(skb, IFLA_INET6_ICMP6STATS,
+ ICMP6_MIB_MAX * sizeof(u64));
+   if (!nla)
+   goto nla_put_failure;
+   snmp6_fill_stats(nla_data(nla), idev, IFLA_INET6_ICMP6STATS,
+nla_len(nla));
+
+   return 0;
+
+nla_put_failure:
+   return -EMSGSIZE;
+}
+
 static int inet6_fill_ifla6_attrs(struct sk_buff *skb, struct inet6_dev *idev,
  u32 ext_filter_mask)
 {
@@ -4935,15 +4958,8 @@ static int inet6_fill_ifla6_attrs(struct sk_buff *skb, 
struct inet6_dev *idev,
if (ext_filter_mask & RTEXT_FILTER_SKIP_STATS)
return 0;
 
-   nla = nla_reserve(skb, IFLA_INET6_STATS, IPSTATS_MIB_MAX * sizeof(u64));
-   if (!nla)
-   goto nla_put_failure;
-   snmp6_fill_stats(nla_data(nla), idev, IFLA_INET6_STATS, nla_len(nla));
-
-   nla = nla_reserve(skb, IFLA_INET6_ICMP6STATS, ICMP6_MIB_MAX * 
sizeof(u64));
-   if (!nla)
+   if (inet6_fill_ifla6_stats(skb, idev))
goto nla_put_failure;
-   snmp6_fill_stats(nla_data(nla), idev, IFLA_INET6_ICMP6STATS, 
nla_len(nla));
 
nla = nla_reserve(skb, IFLA_INET6_TOKEN, sizeof(struct in6_addr));
if (!nla)
@@ -4985,6 +5001,49 @@ static int inet6_fill_link_af(struct sk_buff *skb, const 
struct net_device *dev,
return 0;
 }
 
+static size_t inet6_get_link_af_stats_size(const struct net_device *dev,
+  u32 filter_mask)
+{
+   if (!(filter_mask & IFLA_STATS_FILTER_BIT(IFLA_STATS_INET6)))
+   return 0;
+
+   if (!__in6_dev_get(dev))
+   return 0;
+
+   return nla_total_size(sizeof(struct nlattr)) /* IFLA_STATS_INET6 */
+   + nla_total_size(IPSTATS_MIB_MAX * 8) /* IFLA_INET6_STATS */
+   + nla_total_size(ICMP6_MIB_MAX * sizeof(u64));/* 
IFLA_INET6_ICMP6STATS */
+}
+
+static int inet6_fill_link_af_stats(struct sk_buff *skb,
+   const struct net_device *dev,
+   u32 filter_mask)
+{
+   struct inet6_dev *idev = __in6_dev_get(dev);
+   struct nlattr *inet6_stats;
+
+   if (!(filter_mask & IFLA_STATS_FILTER_BIT(IFLA_STATS_INET6)))
+   return 0;
+
+   if (!idev)
+   return -ENODATA;
+
+   inet6_stats = nla_nest_start(skb, IFLA_STATS_INET6);
+   if (!inet6_stats)
+   return -EMSGSIZE;
+
+   if (inet6_fill_ifla6_stats(skb, idev) < 0)
+   goto errout;
+
+   nla_nest_end(skb, inet6_stats);
+
+   return 0;
+errout:
+   nla_nest_cancel(skb, inet6_stats);
+
+   return -EMSGSIZE;
+}
+
 static int inet6_set_iftoken(struct inet6_dev *idev, struct in6_addr *token)
 {
struct inet6_ifaddr *ifp;
@@ -6079,6 +6138,8 @@ static struct rtnl_af_ops inet6_ops __read_mostly = {
.g

[PATCH net-next WIP] ethtool: generic netlink policy

2016-04-10 Thread Roopa Prabhu

From: Roopa Prabhu 

netlink for ethtool came up at netconf/netdev and we had promised to
send some of the ethtool netlink code we have.
We use a generic netlink channel for ethtool between our kernel and
user space driver. This ethtool channel nicely wraps most ethtool
commands into genl messages. And is capable of handling delayed
remote ops to userspace in some cases (dropping rtnl etc). We use
this channel to also cache some of this ethtool data in the kernel.
In this patch I have included just the genl policy for ethtool which
will apply to the generic usecase. We can certainly share the rest of
it if we see a usecase. Especially the remote handling of ethtool ops
for delayed hw operations maybe useful in other cases (today they are
tied to our remote driver in userspace). The ethtool handlers for
genl use the existing ethtool structs and call into the
respective driver handlers.

This came up again at the switchdev discussion recently and I had
promised to get this out this weekend :). This patch does not include
changes to compile the code.

We should move ethtool to netlink at some point: And I think we
should also explore the possibility of including it into the existing
new devlink generic netlink infrastructure. And ethtool stats should
move to the new stats infrastructure.

Signed-off-by: Roopa Prabhu 
Signed-off-by: Shrijeet Mukherjee 
---
 net/core/ethtool_netlink.c | 200 +
 1 file changed, 200 insertions(+)
 create mode 100644 net/core/ethtool_netlink.c

diff --git a/net/core/ethtool_netlink.c b/net/core/ethtool_netlink.c
new file mode 100644
index 000..f5445f3
--- /dev/null
+++ b/net/core/ethtool_netlink.c
@@ -0,0 +1,200 @@
+/*
+ *  net/core/ethtool_netlink.c - generic ethtool netlink handler
+ *  Copyright (C) 2015 Cumulus Networks
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ *  GNU General Public License for more details.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+static const struct nla_policy ethtool_policy[ETHTOOL_ATTR_MAX + 1] = {
+   [ETHTOOL_ATTR_IFINDEX]  = { .type = NLA_U32 },
+   [ETHTOOL_ATTR_FLAGS]= { .type = NLA_U32 },
+   [ETHTOOL_ATTR_PHYS_ID_STATE]= { .type = NLA_U8 },
+   [ETHTOOL_ATTR_SETTINGS] = { .type = NLA_BINARY,
+   .len = sizeof(struct ethtool_cmd) },
+   [ETHTOOL_ATTR_PAUSE]= { .type = NLA_BINARY,
+   .len = sizeof(struct ethtool_pauseparam) },
+   [ETHTOOL_ATTR_MODINFO]  = { .type = NLA_BINARY,
+   .len = sizeof(struct ethtool_modinfo) },
+   [ETHTOOL_ATTR_EEPROM]   = { .type = NLA_BINARY,
+   .len = sizeof(struct ethtool_eeprom) },
+   [ETHTOOL_ATTR_EEPROM_DATA]  = { .type = NLA_BINARY },
+   [ETHTOOL_ATTR_STATS]= { .type = NLA_NESTED },
+   [ETHTOOL_ATTR_STAT] = { .type = NLA_U32 },
+   [ETHTOOL_ATTR_STRINGS]  = { .type = NLA_NESTED },
+   [ETHTOOL_ATTR_STRING]   = { .type = NLA_STRING,
+   .len = ETH_GSTRING_LEN },
+   [ETHTOOL_ATTR_SSET] = { .type = NLA_U32 },
+   [ETHTOOL_ATTR_SSET_COUNT]   = { .type = NLA_U32 },
+};
+
+static struct genl_family ethtool_family = {
+   .id = GENL_ID_GENERATE,
+   .name = "ethtool_family",
+   .version = 1,
+   .maxattr = ETHTOOL_ATTR_MAX,
+};
+
+static struct genl_multicast_group ethtool_mcgrp[] = {
+   { .name = "port_mc", },
+};
+
+static LIST_HEAD(wq_list);
+
+static struct genl_ops ethtool_ops[] = {
+   {
+   .cmd = ETHTOOL_CMD_GET_SETTINGS,
+   .policy = ethtool_policy,
+   .doit = ethtool_get_settings,
+   },
+   {
+   .cmd = ETHTOOL_CMD_SET_SETTINGS,
+   .policy = ethtool_policy,
+   .doit = ethtool_set_settings,
+   },
+   {
+   .cmd = ETHTOOL_CMD_GET_PAUSE,
+   .policy = ethtool_policy,
+   .doit = ethtool_get_pause,
+   },
+   {
+   .cmd = ETHTOOL_CMD_SET_PAUSE,
+   .policy = ethtool_policy,
+   .doit = ethtool_set_pause,
+   },
+   {
+   .cmd = ETHTOOL_CMD_GET_MODULE_INFO,
+   .policy = ethtool_policy,
+   .doit = ethtool_get_module_info,
+   },
+   {
+   .cmd = ETHTOOL_CMD_SET_MODULE_INFO,
+

[PATCH net-next v3 RFC 2/2] ipv6: add support for stats via RTM_GETSTATS

2016-04-15 Thread Roopa Prabhu

From: Roopa Prabhu 

This patch is an example of adding af stats in
RTM_GETSTATS. It adds a new nested IFLA_STATS_LINK_INET6
attribute for ipv6 af stats. stats attributes inside
IFLA_STATS_LINK_INET6 nested attribute use the existing ipv6
stats attributes from ipv6 IFLA_PROTINFO

Signed-off-by: Roopa Prabhu 
---
This patch is an example of af stats hooked into the new stats
infrastructure. I have tested it to work. My real intent is to have
IFLA_STATS_LINK_MPLS implemented in the same way for mpls.
I am not sure how popular the current ipv6 stats are. so, we could
rethink ipv6 stats in a new way when people see the need
for it in the future.

 include/uapi/linux/if_link.h |  1 +
 net/core/rtnetlink.c |  1 +
 net/ipv6/addrconf.c  | 77 +++-
 3 files changed, 71 insertions(+), 8 deletions(-)

diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index ab740fe..a419a6a2 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -796,6 +796,7 @@ struct if_stats_msg {
 enum {
IFLA_STATS_UNSPEC,
IFLA_STATS_LINK_64,
+   IFLA_STATS_LINK_INET6,
__IFLA_STATS_MAX,
 };
 
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 2a8abe0..687718a 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -3497,6 +3497,7 @@ nla_put_failure:
 
 static const struct nla_policy ifla_stats_policy[IFLA_STATS_MAX + 1] = {
[IFLA_STATS_LINK_64]= { .len = sizeof(struct rtnl_link_stats64) },
+   [IFLA_STATS_LINK_INET6] = {. type = NLA_NESTED },
 };
 
 static size_t rtnl_link_get_af_stats_size(const struct net_device *dev,
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index a6c9927..fdca37c 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -4917,6 +4917,29 @@ static void snmp6_fill_stats(u64 *stats, struct 
inet6_dev *idev, int attrtype,
}
 }
 
+static int inet6_fill_ifla6_stats(struct sk_buff *skb,
+ struct inet6_dev *idev)
+{
+   struct nlattr *nla;
+
+   nla = nla_reserve(skb, IFLA_INET6_STATS, IPSTATS_MIB_MAX * sizeof(u64));
+   if (!nla)
+   goto nla_put_failure;
+   snmp6_fill_stats(nla_data(nla), idev, IFLA_INET6_STATS, nla_len(nla));
+
+   nla = nla_reserve(skb, IFLA_INET6_ICMP6STATS,
+ ICMP6_MIB_MAX * sizeof(u64));
+   if (!nla)
+   goto nla_put_failure;
+   snmp6_fill_stats(nla_data(nla), idev, IFLA_INET6_ICMP6STATS,
+nla_len(nla));
+
+   return 0;
+
+nla_put_failure:
+   return -EMSGSIZE;
+}
+
 static int inet6_fill_ifla6_attrs(struct sk_buff *skb, struct inet6_dev *idev,
  u32 ext_filter_mask)
 {
@@ -4941,15 +4964,8 @@ static int inet6_fill_ifla6_attrs(struct sk_buff *skb, 
struct inet6_dev *idev,
if (ext_filter_mask & RTEXT_FILTER_SKIP_STATS)
return 0;
 
-   nla = nla_reserve(skb, IFLA_INET6_STATS, IPSTATS_MIB_MAX * sizeof(u64));
-   if (!nla)
-   goto nla_put_failure;
-   snmp6_fill_stats(nla_data(nla), idev, IFLA_INET6_STATS, nla_len(nla));
-
-   nla = nla_reserve(skb, IFLA_INET6_ICMP6STATS, ICMP6_MIB_MAX * 
sizeof(u64));
-   if (!nla)
+   if (inet6_fill_ifla6_stats(skb, idev))
goto nla_put_failure;
-   snmp6_fill_stats(nla_data(nla), idev, IFLA_INET6_ICMP6STATS, 
nla_len(nla));
 
nla = nla_reserve(skb, IFLA_INET6_TOKEN, sizeof(struct in6_addr));
if (!nla)
@@ -4991,6 +5007,49 @@ static int inet6_fill_link_af(struct sk_buff *skb, const 
struct net_device *dev,
return 0;
 }
 
+static size_t inet6_get_link_af_stats_size(const struct net_device *dev,
+  u32 filter_mask)
+{
+   if (!(filter_mask & IFLA_STATS_FILTER_BIT(IFLA_STATS_LINK_INET6)))
+   return 0;
+
+   if (!__in6_dev_get(dev))
+   return 0;
+
+   return nla_total_size(sizeof(struct nlattr)) /* IFLA_STATS_LINK_INET6 */
+   + nla_total_size(IPSTATS_MIB_MAX * 8) /* IFLA_INET6_STATS */
+   + nla_total_size(ICMP6_MIB_MAX * sizeof(u64));/* 
IFLA_INET6_ICMP6STATS */
+}
+
+static int inet6_fill_link_af_stats(struct sk_buff *skb,
+   const struct net_device *dev,
+   u32 filter_mask)
+{
+   struct inet6_dev *idev = __in6_dev_get(dev);
+   struct nlattr *inet6_stats;
+
+   if (!(filter_mask & IFLA_STATS_FILTER_BIT(IFLA_STATS_LINK_INET6)))
+   return 0;
+
+   if (!idev)
+   return -ENODATA;
+
+   inet6_stats = nla_nest_start(skb, IFLA_STATS_LINK_INET6);
+   if (!inet6_stats)
+   return -EMSGSIZE;
+
+   if (inet6_fill_ifla6_stats(skb, idev) < 0)
+   goto errout;
+
+   nla_nest_end(skb, inet6_stats);
+
+   return 0;
+errout:
+   nla_nest_cancel(skb, inet6_stats);
+

[PATCH net-next v3 0/2] rtnetlink: new message for stats

2016-04-15 Thread Roopa Prabhu

From: Roopa Prabhu 

This patch adds a new RTM_GETSTATS message to query stats via
netlink from the kernel. RTM_NEWLINK also dumps links stats today, but
RTM_NEWLINK returns a lot more than just stats and is expensive in some
cases when frequent polling for stats from userspace is a common operation.

RTM_GETSTATS is an attempt to provide a light weight netlink message
to explicity query only stats from the kernel. The idea is to also keep
it extensible so that new kinds of stats can be added to it in the future.

Roopa Prabhu (2):
  rtnetlink: add new RTM_GETSTATS to dump link stats
  ipv6: add support for stats via RTM_GETSTATS

Suggested-by: Jamal Hadi Salim 
Signed-off-by: Roopa Prabhu 

RFC to v1 (apologies for the delay in sending this version out. busy days):
- Addressed feedback from Dave
- removed rtnl_link_stats
- Added hdr struct if_stats_msg to carry ifindex and
  filter mask
- new macro IFLA_STATS_FILTER_BIT(ATTR) for filter mask
- split the ipv6 patch into a separate patch, need some more eyes on it
- prefix attributes with IFLA_STATS instead of IFLA_LINK_STATS for
  shorter attribute names

v1 - v2:
- move IFLA_STATS_INET6 declaration to the inet6 patch
- get rid of RTM_DELSTATS
- mark ipv6 patch RFC. It can be used as an example for
  other AF stats like stats

v2 - v3:
- add required padding to the if_stats_msg structure(suggested by jamal)
- rename netdev stat attributes with IFLA_STATS_LINK prefix
  so that they are easily distinguishable with global
  stats in the future (after global stats discussion with thomas)
- get rid of unnecessary copy when getting stats with dev_get_stats
  (suggested by dave)


 include/net/rtnetlink.h|   5 +
 include/uapi/linux/if_link.h   |  19 
 include/uapi/linux/rtnetlink.h |   7 ++
 net/core/rtnetlink.c   | 201 +
 net/ipv6/addrconf.c|  77 ++--
 5 files changed, 301 insertions(+), 8 deletions(-)

-- 
1.9.1

[PATCH net-next v3 1/2] rtnetlink: add new RTM_GETSTATS message to query stats

2016-04-15 Thread Roopa Prabhu

From: Roopa Prabhu 

This patch adds a new RTM_GETSTATS message to query stats via netlink
from the kernel. RTM_NEWLINK also dumps link stats today, but RTM_NEWLINK
returns a lot more than just stats and is expensive in some cases when
frequent polling for stats from userspace is a common operation.

RTM_GETSTATS is an attempt to provide a light weight netlink message
to explicity query only stats from the kernel. The idea is to also
keep it extensible so that new kinds of stats can be added to it in
the future.

This patch adds the following attribute for NETDEV stats:
struct nla_policy ifla_stats_policy[IFLA_STATS_MAX + 1] = {
[IFLA_STATS_LINK_64]  = { .len = sizeof(struct rtnl_link_stats64) },
};

This patch also allows for af family stats (an example af stats for IPV6
is available with the second patch in the series).

Like any other rtnetlink message, RTM_GETSTATS can be used to get stats of
a single interface or all interfaces with NLM_F_DUMP.

Future possible new types of stat attributes:
- IFLA_STATS_LINK_MPLS  (nested. for mpls/mdev stats)
- IFLA_STATS_LINK_EXTENDED (nested. extended software netdev stats like bridge,
  vlan, vxlan etc)
- IFLA_STATS_LINK_HW_EXTENDED (nested. extended hardware stats which are
  available via ethtool today)

This patch also declares a filter mask for all stat attributes.
User has to provide a mask of stats attributes to query. filter mask
can be specified in the new hdr 'struct if_stats_msg' for stats messages.
Other important field in the header is the ifindex.

This api can be used for global stats (eg tcp) in the future. When global
stats are included in a stats msg, the ifindex in the header
must be zero. A single stats message cannot contain both global and
netdev specific stats. To easily distinguish them, netdev specific stat
attributes name are prefixed with IFLA_STATS_LINK_

Without any attributes in the filter_mask, no stats will be returned.

This patch has been tested with mofified iproute2 ifstat.

Suggested-by: Jamal Hadi Salim 
Signed-off-by: Roopa Prabhu 
---
 include/net/rtnetlink.h|   5 ++
 include/uapi/linux/if_link.h   |  23 +
 include/uapi/linux/rtnetlink.h |   5 ++
 net/core/rtnetlink.c   | 199 +
 4 files changed, 232 insertions(+)

diff --git a/include/net/rtnetlink.h b/include/net/rtnetlink.h
index 2f87c1b..fa68158 100644
--- a/include/net/rtnetlink.h
+++ b/include/net/rtnetlink.h
@@ -131,6 +131,11 @@ struct rtnl_af_ops {
const struct nlattr *attr);
int (*set_link_af)(struct net_device *dev,
   const struct nlattr *attr);
+   size_t  (*get_link_af_stats_size)(const struct 
net_device *dev,
+ u32 filter_mask);
+   int (*fill_link_af_stats)(struct sk_buff *skb,
+ const struct net_device 
*dev,
+ u32 filter_mask);
 };
 
 void __rtnl_af_unregister(struct rtnl_af_ops *ops);
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index 9427f17..ab740fe 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -780,4 +780,27 @@ enum {
 
 #define IFLA_HSR_MAX (__IFLA_HSR_MAX - 1)
 
+/* STATS section */
+
+struct if_stats_msg {
+   __u8  family;
+   __u8  pad1;
+   __u16 pad2;
+   __u32 ifindex;
+   __u32 filter_mask;
+};
+
+/* A stats attribute can be netdev specific or a global stat.
+ * For netdev stats, lets use the prefix IFLA_STATS_LINK_*
+ */
+enum {
+   IFLA_STATS_UNSPEC,
+   IFLA_STATS_LINK_64,
+   __IFLA_STATS_MAX,
+};
+
+#define IFLA_STATS_MAX (__IFLA_STATS_MAX - 1)
+
+#define IFLA_STATS_FILTER_BIT(ATTR)(1 << (ATTR))
+
 #endif /* _UAPI_LINUX_IF_LINK_H */
diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
index ca764b5..cc885c4 100644
--- a/include/uapi/linux/rtnetlink.h
+++ b/include/uapi/linux/rtnetlink.h
@@ -139,6 +139,11 @@ enum {
RTM_GETNSID = 90,
 #define RTM_GETNSID RTM_GETNSID
 
+   RTM_NEWSTATS = 92,
+#define RTM_NEWSTATS RTM_NEWSTATS
+   RTM_GETSTATS = 94,
+#define RTM_GETSTATS RTM_GETSTATS
+
__RTM_MAX,
 #define RTM_MAX(((__RTM_MAX + 3) & ~3) - 1)
 };
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index a7a3d34..2a8abe0 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -3444,6 +3444,202 @@ out:
return err;
 }
 
+static int rtnl_fill_statsinfo(struct sk_buff *skb, struct net_device *dev,
+  int type, u32 pid, u32 seq, u32 change,
+  unsigned int flags, unsigned int filter_mask)
+{
+   struct if_stats_msg *ifsm;
+   struct nlmsghdr *nlh;
+   struct rtnl_af_ops *af_ops;
+

[PATCH net-next] rtnetlink: rtnl_fill_stats: avoid an unnecssary stats copy

2016-04-15 Thread Roopa Prabhu

From: Roopa Prabhu 

This patch passes netlink attr data ptr directly to dev_get_stats
thus elimiating a stats copy.

Suggested-by: David Miller 
Signed-off-by: Roopa Prabhu 
---
 net/core/rtnetlink.c | 23 ---
 1 file changed, 8 insertions(+), 15 deletions(-)

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index a75f7e9..a7a3d34 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -808,11 +808,6 @@ static void copy_rtnl_link_stats(struct rtnl_link_stats *a,
a->rx_nohandler = b->rx_nohandler;
 }
 
-static void copy_rtnl_link_stats64(void *v, const struct rtnl_link_stats64 *b)
-{
-   memcpy(v, b, sizeof(*b));
-}
-
 /* All VF info */
 static inline int rtnl_vfinfo_size(const struct net_device *dev,
   u32 ext_filter_mask)
@@ -1054,25 +1049,23 @@ static int rtnl_phys_switch_id_fill(struct sk_buff 
*skb, struct net_device *dev)
 static noinline_for_stack int rtnl_fill_stats(struct sk_buff *skb,
  struct net_device *dev)
 {
-   const struct rtnl_link_stats64 *stats;
-   struct rtnl_link_stats64 temp;
+   struct rtnl_link_stats64 *sp;
struct nlattr *attr;
 
-   stats = dev_get_stats(dev, &temp);
-
-   attr = nla_reserve(skb, IFLA_STATS,
-  sizeof(struct rtnl_link_stats));
+   attr = nla_reserve(skb, IFLA_STATS64,
+  sizeof(struct rtnl_link_stats64));
if (!attr)
return -EMSGSIZE;
 
-   copy_rtnl_link_stats(nla_data(attr), stats);
+   sp = nla_data(attr);
+   dev_get_stats(dev, sp);
 
-   attr = nla_reserve(skb, IFLA_STATS64,
-  sizeof(struct rtnl_link_stats64));
+   attr = nla_reserve(skb, IFLA_STATS,
+  sizeof(struct rtnl_link_stats));
if (!attr)
return -EMSGSIZE;
 
-   copy_rtnl_link_stats64(nla_data(attr), stats);
+   copy_rtnl_link_stats(nla_data(attr), sp);
 
return 0;
 }
-- 
1.9.1

[PATCH net-next v4] rtnetlink: add new RTM_GETSTATS message to dump link stats

2016-04-17 Thread Roopa Prabhu

From: Roopa Prabhu 

This patch adds a new RTM_GETSTATS message to query link stats via netlink
from the kernel. RTM_NEWLINK also dumps stats today, but RTM_NEWLINK
returns a lot more than just stats and is expensive in some cases when
frequent polling for stats from userspace is a common operation.

RTM_GETSTATS is an attempt to provide a light weight netlink message
to explicity query only link stats from the kernel on an interface.
The idea is to also keep it extensible so that new kinds of stats can be
added to it in the future.

This patch adds the following attribute for NETDEV stats:
struct nla_policy ifla_stats_policy[IFLA_STATS_MAX + 1] = {
[IFLA_STATS_LINK_64]  = { .len = sizeof(struct rtnl_link_stats64) },
};

Like any other rtnetlink message, RTM_GETSTATS can be used to get stats of
a single interface or all interfaces with NLM_F_DUMP.

Future possible new types of stat attributes:
link af stats:
- IFLA_STATS_LINK_IPV6  (nested. for ipv6 stats)
- IFLA_STATS_LINK_MPLS  (nested. for mpls/mdev stats)
extended stats:
- IFLA_STATS_LINK_EXTENDED (nested. extended software netdev stats like 
bridge,
  vlan, vxlan etc)
- IFLA_STATS_LINK_HW_EXTENDED (nested. extended hardware stats which are
  available via ethtool today)

This patch also declares a filter mask for all stat attributes.
User has to provide a mask of stats attributes to query. filter mask
can be specified in the new hdr 'struct if_stats_msg' for stats messages.
Other important field in the header is the ifindex.

This api can also include attributes for global stats (eg tcp) in the future.
When global stats are included in a stats msg, the ifindex in the header
must be zero. A single stats message cannot contain both global and
netdev specific stats. To easily distinguish them, netdev specific stat
attributes name are prefixed with IFLA_STATS_LINK_

Without any attributes in the filter_mask, no stats will be returned.

This patch has been tested with mofified iproute2 ifstat.

Suggested-by: Jamal Hadi Salim 
Signed-off-by: Roopa Prabhu 
---

RFC to v1 (apologies for the delay in sending this version out. busy days):
- Addressed feedback from Dave
- removed rtnl_link_stats
- Added hdr struct if_stats_msg to carry ifindex and
  filter mask
- new macro IFLA_STATS_FILTER_BIT(ATTR) for filter mask
- split the ipv6 patch into a separate patch, need some more eyes on it
- prefix attributes with IFLA_STATS instead of IFLA_LINK_STATS for
  shorter attribute names

v2:
- move IFLA_STATS_INET6 declaration to the inet6 patch
- get rid of RTM_DELSTATS
- mark ipv6 patch RFC. It can be used as an example for
  other AF stats like stats

v3:
- add required padding to the if_stats_msg structure(suggested by jamal)
- rename netdev stat attributes with IFLA_STATS_LINK prefix
  so that they are easily distinguishable with global
  stats in the future (after global stats discussion with thomas)
- get rid of unnecessary copy when getting stats with dev_get_stats
  (suggested by dave)

v4:
- dropped calcit and af stats from this patch. Will add it
  back when it becomes necessary and with the first af stats
  patch
- add check for null filter in dump and return -EINVAL:
  this follows rtnl_fdb_dump in returning an error.
  But since netlink_dump does not propagate the error
  to the user, the user will not see an error and
  but will also not see any data. This is consistent with
  other kinds of dumps.
 
 include/uapi/linux/if_link.h   |  23 +++
 include/uapi/linux/rtnetlink.h |   5 ++
 net/core/rtnetlink.c   | 150 +
 3 files changed, 178 insertions(+)

diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index bb3a90b..0762f35 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -781,4 +781,27 @@ enum {
 
 #define IFLA_HSR_MAX (__IFLA_HSR_MAX - 1)
 
+/* STATS section */
+
+struct if_stats_msg {
+   __u8  family;
+   __u8  pad1;
+   __u16 pad2;
+   __u32 ifindex;
+   __u32 filter_mask;
+};
+
+/* A stats attribute can be netdev specific or a global stat.
+ * For netdev stats, lets use the prefix IFLA_STATS_LINK_*
+ */
+enum {
+   IFLA_STATS_UNSPEC,
+   IFLA_STATS_LINK_64,
+   __IFLA_STATS_MAX,
+};
+
+#define IFLA_STATS_MAX (__IFLA_STATS_MAX - 1)
+
+#define IFLA_STATS_FILTER_BIT(ATTR)(1 << (ATTR))
+
 #endif /* _UAPI_LINUX_IF_LINK_H */
diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
index ca764b5..cc885c4 100644
--- a/include/uapi/linux/rtnetlink.h
+++ b/include/uapi/linux/rtnetlink.h
@@ -139,6 +139,11 @@ enum {
RTM_GETNSID = 90,
 #define RTM_GETNSID RTM_GETNSID
 
+   RTM_NEWSTAT

[PATCH net-next v5] rtnetlink: add new RTM_GETSTATS message to dump link stats

2016-04-18 Thread Roopa Prabhu

From: Roopa Prabhu 

This patch adds a new RTM_GETSTATS message to query link stats via netlink
from the kernel. RTM_NEWLINK also dumps stats today, but RTM_NEWLINK
returns a lot more than just stats and is expensive in some cases when
frequent polling for stats from userspace is a common operation.

RTM_GETSTATS is an attempt to provide a light weight netlink message
to explicity query only link stats from the kernel on an interface.
The idea is to also keep it extensible so that new kinds of stats can be
added to it in the future.

This patch adds the following attribute for NETDEV stats:
struct nla_policy ifla_stats_policy[IFLA_STATS_MAX + 1] = {
[IFLA_STATS_LINK_64]  = { .len = sizeof(struct rtnl_link_stats64) },
};

Like any other rtnetlink message, RTM_GETSTATS can be used to get stats of
a single interface or all interfaces with NLM_F_DUMP.

Future possible new types of stat attributes:
link af stats:
- IFLA_STATS_LINK_IPV6  (nested. for ipv6 stats)
- IFLA_STATS_LINK_MPLS  (nested. for mpls/mdev stats)
extended stats:
- IFLA_STATS_LINK_EXTENDED (nested. extended software netdev stats like 
bridge,
  vlan, vxlan etc)
- IFLA_STATS_LINK_HW_EXTENDED (nested. extended hardware stats which are
  available via ethtool today)

This patch also declares a filter mask for all stat attributes.
User has to provide a mask of stats attributes to query. filter mask
can be specified in the new hdr 'struct if_stats_msg' for stats messages.
Other important field in the header is the ifindex.

This api can also include attributes for global stats (eg tcp) in the future.
When global stats are included in a stats msg, the ifindex in the header
must be zero. A single stats message cannot contain both global and
netdev specific stats. To easily distinguish them, netdev specific stat
attributes name are prefixed with IFLA_STATS_LINK_

Without any attributes in the filter_mask, no stats will be returned.

This patch has been tested with mofified iproute2 ifstat.

Suggested-by: Jamal Hadi Salim 
Signed-off-by: Roopa Prabhu 
---
RFC to v1 (apologies for the delay in sending this version out. busy days):
- Addressed feedback from Dave
- removed rtnl_link_stats
- Added hdr struct if_stats_msg to carry ifindex and
  filter mask
- new macro IFLA_STATS_FILTER_BIT(ATTR) for filter mask
- split the ipv6 patch into a separate patch, need some more eyes on it
- prefix attributes with IFLA_STATS instead of IFLA_LINK_STATS for
  shorter attribute names

v2:
- move IFLA_STATS_INET6 declaration to the inet6 patch
- get rid of RTM_DELSTATS
- mark ipv6 patch RFC. It can be used as an example for
  other AF stats like stats

v3:
- add required padding to the if_stats_msg structure(suggested by jamal)
- rename netdev stat attributes with IFLA_STATS_LINK prefix
  so that they are easily distinguishable with global
  stats in the future (after global stats discussion with thomas)
- get rid of unnecessary copy when getting stats with dev_get_stats
  (suggested by dave)

v4:
- dropped calcit and af stats from this patch. Will add it
  back when it becomes necessary and with the first af stats
  patch
- add check for null filter in dump and return -EINVAL:
  this follows rtnl_fdb_dump in returning an error.
  But since netlink_dump does not propagate the error
  to the user, the user will not see an error and
  but will also not see any data. This is consistent with
  other kinds of dumps.

v5:
- fix selinux nlmsgtab to account for new RTM_*STATS messages

 include/uapi/linux/if_link.h   |  23 +++
 include/uapi/linux/rtnetlink.h |   5 ++
 net/core/rtnetlink.c   | 150 +
 security/selinux/nlmsgtab.c|   4 +-
 4 files changed, 181 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index bb3a90b..0762f35 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -781,4 +781,27 @@ enum {
 
 #define IFLA_HSR_MAX (__IFLA_HSR_MAX - 1)
 
+/* STATS section */
+
+struct if_stats_msg {
+   __u8  family;
+   __u8  pad1;
+   __u16 pad2;
+   __u32 ifindex;
+   __u32 filter_mask;
+};
+
+/* A stats attribute can be netdev specific or a global stat.
+ * For netdev stats, lets use the prefix IFLA_STATS_LINK_*
+ */
+enum {
+   IFLA_STATS_UNSPEC,
+   IFLA_STATS_LINK_64,
+   __IFLA_STATS_MAX,
+};
+
+#define IFLA_STATS_MAX (__IFLA_STATS_MAX - 1)
+
+#define IFLA_STATS_FILTER_BIT(ATTR)(1 << (ATTR))
+
 #endif /* _UAPI_LINUX_IF_LINK_H */
diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
index ca764b5..cc885c4 100644
--- a/include/uapi/linux/rtnetlink.h
+++ b/include/uapi/linux/rtnet

Re: [PATCH net-next v5] rtnetlink: add new RTM_GETSTATS message to dump link stats

2016-04-18 Thread Roopa Prabhu

On 4/18/16, 7:22 PM, Eric Dumazet wrote:
> On Mon, 2016-04-18 at 21:48 -0400, David Miller wrote:
>
>> And anyways, I get unaligned accesses without Roopa's changes :-/
>>
>> davem@patience:~$ ip l l
>> [3391066.656729] Kernel unaligned access at TPC[7d6c14] 
>> loopback_get_stats64+0x74/0xa0
>> [3391066.672020] Kernel unaligned access at TPC[7d6c18] 
>> loopback_get_stats64+0x78/0xa0
>> [3391066.687282] Kernel unaligned access at TPC[7d6c1c] 
>> loopback_get_stats64+0x7c/0xa0
>> [3391066.702573] Kernel unaligned access at TPC[7d6c20] 
>> loopback_get_stats64+0x80/0xa0
>> [3391066.717858] Kernel unaligned access at TPC[8609dc] 
>> dev_get_stats+0x3c/0xe0
> Yes, rtnl_fill_stats() probably has the same mistake.
>
> commit 550bce59baf3f3059cd4ae1e268f08f2d2cb1d5c
> Author: Roopa Prabhu 
> Date:   Fri Apr 15 20:36:25 2016 -0700
>
> rtnetlink: rtnl_fill_stats: avoid an unnecssary stats copy
> 
> This patch passes netlink attr data ptr directly to dev_get_stats
> thus elimiating a stats copy.
> 
> Suggested-by: David Miller 
> Signed-off-by: Roopa Prabhu 
> Signed-off-by: David S. Miller 
>
>
>
David, if you revert the one in rtnl_fill_stats, i will take care of the 
dev_get_stats in RTM_GETSTATS in v6.

thanks,
Roopa

Re: [PATCH net-next v5] rtnetlink: add new RTM_GETSTATS message to dump link stats

2016-04-18 Thread Roopa Prabhu

On 4/18/16, 8:41 PM, David Miller wrote:
> From: Roopa Prabhu 
> Date: Mon, 18 Apr 2016 14:10:19 -0700
>
>> This patch adds a new RTM_GETSTATS message to query link stats via
>> netlink from the kernel. RTM_NEWLINK also dumps stats today, but
>> RTM_NEWLINK returns a lot more than just stats and is expensive in
>> some cases when frequent polling for stats from userspace is a
>> common operation.
> I'm holding off on this until we sort out the 64-bit netlink
> attribute alignment issue.
sure,
>
> Meanwhile, I'll some kind of a fix into the tree for the
> rtnl_fill_stats() change so that it doesn't cause unaligned
> accesses.
>
> I just tested out a clever idea, where for architectures where
> unaligned accesses is a problem, we insert a zero length NOP attribute
> before the 64-bit stats.  This makes it properly aligned.  A quick
> hack patch just passed testing on my sparc64 box, but I'll go over it
> some more.
>
> diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
> index bb3a90b..5ffdcb3 100644
> --- a/include/uapi/linux/if_link.h
> +++ b/include/uapi/linux/if_link.h
> @@ -155,6 +155,7 @@ enum {
>   IFLA_PROTO_DOWN,
>   IFLA_GSO_MAX_SEGS,
>   IFLA_GSO_MAX_SIZE,
> + IFLA_PAD,
>   __IFLA_MAX
>  };
>  
> diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
> index a7a3d34..b192576 100644
> --- a/net/core/rtnetlink.c
> +++ b/net/core/rtnetlink.c
> @@ -1052,6 +1052,15 @@ static noinline_for_stack int rtnl_fill_stats(struct 
> sk_buff *skb,
>   struct rtnl_link_stats64 *sp;
>   struct nlattr *attr;
>  
> + /* Add a zero length NOP attribute so that the nla_data()
> +  * of the IFLA_STATS64 will be 64-bit aligned.
> +  */
> +#ifndef HAVE_EFFICIENT_UNALIGNED_ACCESS
> + attr = nla_reserve(skb, IFLA_PAD, 0);
> + if (!attr)
> + return -EMSGSIZE;
> +#endif
> +
>   attr = nla_reserve(skb, IFLA_STATS64,
>  sizeof(struct rtnl_link_stats64));
>   if (!attr)
>
that is cleaver :)

thanks!

Re: [PATCH net-next v5] rtnetlink: add new RTM_GETSTATS message to dump link stats

2016-04-19 Thread Roopa Prabhu

On 4/19/16, 11:31 AM, David Miller wrote:

[snip]
>
> Here is the final patch I'm about to push out, thanks a lot Eric.
>
> Roopa, please adjust your GETSTATS patch as needed (I think you need
> to adjust the SELinux table entry as well) and we can integrate that
> too.
ok, will do. one thing though, for GETSTATS, if I need a pad attribute like 
IFLA_PAD,
I will need to add a new stats attribute IFLA_STATS_PAD and burn a bit for it 
in filter_mask too.
In which case, I am wondering if we should live with the copy. I will take any 
suggestions here.

I had adjusted the SELinux table entries for v5. I will check again and make 
sure it is right for v6.

>
> 
> [PATCH] net: Align IFLA_STATS64 attributes properly on architectures that 
> need it.
>
> Since the nlattr header is 4 bytes in size, it can cause the netlink
> attribute payload to not be 8-byte aligned.
>
> This is particularly troublesome for IFLA_STATS64 which contains 64-bit
> statistic values.
>
> Solve this by creating a dummy IFLA_PAD attribute which has a payload
> which is zero bytes in size.  When HAVE_EFFICIENT_UNALIGNED_ACCESS is
> false, we insert an IFLA_PAD attribute into the netlink response when
> necessary such that the IFLA_STATS64 payload will be properly aligned.
>
> With help and suggestions from Eric Dumazet.
>
> Signed-off-by: David S. Miller 
[snip]


Thanks David.

Re: [PATCH net-next v5] rtnetlink: add new RTM_GETSTATS message to dump link stats

2016-04-19 Thread Roopa Prabhu

On 4/19/16, 12:55 PM, Paul Moore wrote:
> On Tue, Apr 19, 2016 at 4:26 AM, Nicolas Dichtel
>  wrote:
>> + selinux maintainers
>>
>> Le 18/04/2016 23:10, Roopa Prabhu a écrit :
>> [snip]
>>> diff --git a/security/selinux/nlmsgtab.c b/security/selinux/nlmsgtab.c
>>> index 8495b93..1714633 100644
>>> --- a/security/selinux/nlmsgtab.c
>>> +++ b/security/selinux/nlmsgtab.c
>>> @@ -76,6 +76,8 @@ static struct nlmsg_perm nlmsg_route_perms[] =
>>> { RTM_NEWNSID,  NETLINK_ROUTE_SOCKET__NLMSG_WRITE },
>>> { RTM_DELNSID,  NETLINK_ROUTE_SOCKET__NLMSG_READ  },
>>> { RTM_GETNSID,  NETLINK_ROUTE_SOCKET__NLMSG_READ  },
>>> +   { RTM_NEWSTATS, NETLINK_ROUTE_SOCKET__NLMSG_WRITE },
>> I would say it's NETLINK_ROUTE_SOCKET__NLMSG_READ, not WRITE. This command
>> is only sent by the kernel, not by the userland.
> From what I could tell from the patch description, it looks like
> RTM_NEWSTATS only dumps stats to userspace and doesn't alter the state
> of the kernel, is that correct?  If so, then yes, NLMSG__READ is the
> right SELinux permission.  However, if RTM_NEWSTATS does alter the
> state/configuration of the kernel then we should use NLMSG__WRITE.
>
okay, will change it to READ in the next version,

thanks.

Re: [PATCH net-next v5] rtnetlink: add new RTM_GETSTATS message to dump link stats

2016-04-19 Thread Roopa Prabhu

On 4/19/16, 3:49 PM, David Miller wrote:
> From: Roopa Prabhu 
> Date: Tue, 19 Apr 2016 12:05:00 -0700
>
>> ok, will do. one thing though, for GETSTATS, if I need a pad
>> attribute like IFLA_PAD, I will need to add a new stats attribute
>> IFLA_STATS_PAD and burn a bit for it in filter_mask too.  In which
>> case, I am wondering if we should live with the copy. I will take
>> any suggestions here.
> I don't think the copy is appropriate, especially if the existing full
> link state dump gets away without it.  We're adding this facility for
> performance reasons after all.
>
> You have several options to avoid wasting filter mask space. For
> example, you could use IFLA_STATS_UNSPEC, which should be OK since
> only new applications will use these.
>
> Or you could make IFLA_STATS_PAD the first attribute, and define the
> filter mask as relative to it.  Ie. IFLA_STATS_LINK_64 uses bit
> (IFLA_STATS_LINK_64 - IFLA_STATS_PAD), etc.
ok ack, makes sense.

thanks,
Roopa

Re: [PATCH net-next v5] rtnetlink: add new RTM_GETSTATS message to dump link stats

2016-04-19 Thread Roopa Prabhu

On 4/19/16, 4:50 PM, David Miller wrote:
> From: Nicolas Dichtel 
> Date: Tue, 19 Apr 2016 21:08:21 +0200
>
>> Le 19/04/2016 20:47, Eric Dumazet a écrit :
>>> Since we want to use this in other places, we could define a helper.
>>>
>>> nla_align_64bit(skb, attribute)  or something.
>> Yes, with the corresponding nla_total_size_64bit()
> Good, idea, committed the following:
>
> Roopa, please use these helpers in your RTM_GETSTATS patch.

will do.
>
> Thank you.
>
> 
> [PATCH] net: Add helpers for 64-bit aligning netlink attributes.
>
> Suggested-by: Eric Dumazet 
> Suggested-by: Nicolas Dichtel 
> Signed-off-by: David S. Miller 
> ---
>  
these look really nice.

Thanks!

[PATCH net-next v6] rtnetlink: add new RTM_GETSTATS message to dump link stats

2016-04-20 Thread Roopa Prabhu

From: Roopa Prabhu 

This patch adds a new RTM_GETSTATS message to query link stats via netlink
from the kernel. RTM_NEWLINK also dumps stats today, but RTM_NEWLINK
returns a lot more than just stats and is expensive in some cases when
frequent polling for stats from userspace is a common operation.

RTM_GETSTATS is an attempt to provide a light weight netlink message
to explicity query only link stats from the kernel on an interface.
The idea is to also keep it extensible so that new kinds of stats can be
added to it in the future.

This patch adds the following attribute for NETDEV stats:
struct nla_policy ifla_stats_policy[IFLA_STATS_MAX + 1] = {
[IFLA_STATS_LINK_64]  = { .len = sizeof(struct rtnl_link_stats64) },
};

Like any other rtnetlink message, RTM_GETSTATS can be used to get stats of
a single interface or all interfaces with NLM_F_DUMP.

Future possible new types of stat attributes:
link af stats:
- IFLA_STATS_LINK_IPV6  (nested. for ipv6 stats)
- IFLA_STATS_LINK_MPLS  (nested. for mpls/mdev stats)
extended stats:
- IFLA_STATS_LINK_EXTENDED (nested. extended software netdev stats like 
bridge,
  vlan, vxlan etc)
- IFLA_STATS_LINK_HW_EXTENDED (nested. extended hardware stats which are
  available via ethtool today)

This patch also declares a filter mask for all stat attributes.
User has to provide a mask of stats attributes to query. filter mask
can be specified in the new hdr 'struct if_stats_msg' for stats messages.
Other important field in the header is the ifindex.

This api can also include attributes for global stats (eg tcp) in the future.
When global stats are included in a stats msg, the ifindex in the header
must be zero. A single stats message cannot contain both global and
netdev specific stats. To easily distinguish them, netdev specific stat
attributes name are prefixed with IFLA_STATS_LINK_

Without any attributes in the filter_mask, no stats will be returned.

This patch has been tested with mofified iproute2 ifstat.

Suggested-by: Jamal Hadi Salim 
Signed-off-by: Roopa Prabhu 
---
RFC to v1 (apologies for the delay in sending this version out. busy days):
- Addressed feedback from Dave
- removed rtnl_link_stats
- Added hdr struct if_stats_msg to carry ifindex and
  filter mask
- new macro IFLA_STATS_FILTER_BIT(ATTR) for filter mask
- split the ipv6 patch into a separate patch, need some more eyes on it
- prefix attributes with IFLA_STATS instead of IFLA_LINK_STATS for
  shorter attribute names

v2:
- move IFLA_STATS_INET6 declaration to the inet6 patch
- get rid of RTM_DELSTATS
- mark ipv6 patch RFC. It can be used as an example for
  other AF stats like stats

v3:
- add required padding to the if_stats_msg structure(suggested by jamal)
- rename netdev stat attributes with IFLA_STATS_LINK prefix
  so that they are easily distinguishable with global
  stats in the future (after global stats discussion with thomas)
- get rid of unnecessary copy when getting stats with dev_get_stats
  (suggested by dave)

v4:
- dropped calcit and af stats from this patch. Will add it
  back when it becomes necessary and with the first af stats
  patch
- add check for null filter in dump and return -EINVAL:
  this follows rtnl_fdb_dump in returning an error.
  But since netlink_dump does not propagate the error
  to the user, the user will not see an error and
  but will also not see any data. This is consistent with
  other kinds of dumps.

v5:
- fix selinux nlmsgtab to account for new RTM_*STATS messages

v6:
- fix alignment for 64bit stats attribute, using davids new
  cleaver trick of using a pad attribute and new helper apis
- change selinux RTM_NEWSTATS permissions to READ since this
  patch does not support writes yet.

 include/uapi/linux/if_link.h   |  23 ++
 include/uapi/linux/rtnetlink.h |   5 ++
 net/core/rtnetlink.c   | 158 +
 security/selinux/nlmsgtab.c|   4 +-
 4 files changed, 189 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index 5ffdcb3..115ccc1 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -782,4 +782,27 @@ enum {
 
 #define IFLA_HSR_MAX (__IFLA_HSR_MAX - 1)
 
+/* STATS section */
+
+struct if_stats_msg {
+   __u8  family;
+   __u8  pad1;
+   __u16 pad2;
+   __u32 ifindex;
+   __u32 filter_mask;
+};
+
+/* A stats attribute can be netdev specific or a global stat.
+ * For netdev stats, lets use the prefix IFLA_STATS_LINK_*
+ */
+enum {
+   IFLA_STATS_UNSPEC, /* also used as 64bit pad attribute */
+   IFLA_STATS_LINK_64,
+   __IFLA_STATS_MAX,
+};
+
+#define IFLA_STATS_MAX (__IFLA

[PATCH iproute2 WIP] ifstat: use new RTM_GETSTATS api

2016-04-20 Thread Roopa Prabhu

From: Roopa Prabhu 

sample hacked up patch currently used for testing.
needs re-work if ifstat will move to RTM_GETSTATS.

Signed-off-by: Roopa Prabhu 
---
 include/libnetlink.h  |  6 ++
 include/linux/if_link.h   | 22 ++
 include/linux/rtnetlink.h |  5 +
 lib/libnetlink.c  | 31 +++
 misc/ifstat.c | 37 -
 5 files changed, 84 insertions(+), 17 deletions(-)

diff --git a/include/libnetlink.h b/include/libnetlink.h
index 491263f..ccaab46 100644
--- a/include/libnetlink.h
+++ b/include/libnetlink.h
@@ -44,6 +44,12 @@ int rtnl_dump_request(struct rtnl_handle *rth, int type, 
void *req,
 int rtnl_dump_request_n(struct rtnl_handle *rth, struct nlmsghdr *n)
__attribute__((warn_unused_result));
 
+int rtnl_wilddump_stats_request(struct rtnl_handle *rth, int family, int type)
+   __attribute__((warn_unused_result));
+int rtnl_wilddump_stats_req_filter(struct rtnl_handle *rth, int family,
+  int type, __u32 filt_mask)
+  __attribute__((warn_unused_result));
+
 struct rtnl_ctrl_data {
int nsid;
 };
diff --git a/include/linux/if_link.h b/include/linux/if_link.h
index 6a688e8..eb1064a 100644
--- a/include/linux/if_link.h
+++ b/include/linux/if_link.h
@@ -165,6 +165,8 @@ enum {
 #define IFLA_RTA(r)  ((struct rtattr*)(((char*)(r)) + 
NLMSG_ALIGN(sizeof(struct ifinfomsg
 #define IFLA_PAYLOAD(n) NLMSG_PAYLOAD(n,sizeof(struct ifinfomsg))
 
+#define IFLA_RTA_STATS(r)  ((struct rtattr*)(((char*)(r)) + 
NLMSG_ALIGN(sizeof(struct if_stats_msg
+
 enum {
IFLA_INET_UNSPEC,
IFLA_INET_CONF,
@@ -777,4 +779,24 @@ enum {
 
 #define IFLA_HSR_MAX (__IFLA_HSR_MAX - 1)
 
+/* STATS section */
+
+struct if_stats_msg {
+   __u8  family;
+   __u8  pad1;
+   __u16 pad2;
+   __u32 ifindex;
+   __u32 filter_mask;
+};
+
+enum {
+   IFLA_STATS_UNSPEC,
+   IFLA_STATS_LINK_64,
+   __IFLA_STATS_MAX,
+};
+
+#define IFLA_STATS_MAX (__IFLA_STATS_MAX - 1)
+
+#define IFLA_STATS_FILTER_BIT(ATTR)  (1 << (ATTR - 1))
+
 #endif /* _LINUX_IF_LINK_H */
diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h
index 6aaa2a3..e8cdff5 100644
--- a/include/linux/rtnetlink.h
+++ b/include/linux/rtnetlink.h
@@ -139,6 +139,11 @@ enum {
RTM_GETNSID = 90,
 #define RTM_GETNSID RTM_GETNSID
 
+   RTM_NEWSTATS = 92,
+#define RTM_NEWSTATS RTM_NEWSTATS
+   RTM_GETSTATS = 94,
+#define RTM_GETSTATS RTM_GETSTATS
+
__RTM_MAX,
 #define RTM_MAX(((__RTM_MAX + 3) & ~3) - 1)
 };
diff --git a/lib/libnetlink.c b/lib/libnetlink.c
index a90e52c..f7baf51 100644
--- a/lib/libnetlink.c
+++ b/lib/libnetlink.c
@@ -838,3 +838,34 @@ int __parse_rtattr_nested_compat(struct rtattr *tb[], int 
max, struct rtattr *rt
memset(tb, 0, sizeof(struct rtattr *) * (max + 1));
return 0;
 }
+
+int rtnl_wilddump_stats_req_filter(struct rtnl_handle *rth, int family, int 
type,
+  __u32 filt_mask)
+{
+   struct {
+   struct nlmsghdr nlh;
+   struct if_stats_msg ifsm;
+   } req;
+
+   int err;
+
+   memset(&req, 0, sizeof(req));
+   req.nlh.nlmsg_len = sizeof(req);
+   req.nlh.nlmsg_type = type;
+   req.nlh.nlmsg_flags = NLM_F_DUMP|NLM_F_REQUEST;
+   req.nlh.nlmsg_pid = 0;
+   req.nlh.nlmsg_seq = rth->dump = ++rth->seq;
+   req.ifsm.family = family;
+   req.ifsm.filter_mask = filt_mask;
+
+   err = send(rth->fd, (void*)&req, sizeof(req), 0);
+
+   return err;
+}
+
+int rtnl_wilddump_stats_request(struct rtnl_handle *rth, int family, int type)
+{
+   return rtnl_wilddump_stats_req_filter(rth, family, type,
+ 
IFLA_STATS_FILTER_BIT(IFLA_STATS_LINK_64));
+}
+
diff --git a/misc/ifstat.c b/misc/ifstat.c
index abbb4e7..e517c9a 100644
--- a/misc/ifstat.c
+++ b/misc/ifstat.c
@@ -35,6 +35,8 @@
 
 #include 
 
+#include "utils.h"
+
 int dump_zeros;
 int reset_history;
 int ignore_history;
@@ -49,6 +51,8 @@ double W;
 char **patterns;
 int npatterns;
 
+struct rtnl_handle rth;
+
 char info_source[128];
 int source_mismatch;
 
@@ -58,9 +62,9 @@ struct ifstat_ent {
struct ifstat_ent   *next;
char*name;
int ifindex;
-   unsigned long long  val[MAXS];
+   __u64   val[MAXS];
double  rate[MAXS];
-   __u32   ival[MAXS];
+   __u64   ival[MAXS];
 };
 
 static const char *stats[MAXS] = {
@@ -109,32 +113,29 @@ static int match(const char *id)
 static int get_nlmsg(const struct sockaddr_nl *who,
 struct nlmsghdr *m, void *arg)
 {
-   struct ifinfomsg *ifi = NLMSG_DATA(m);
-   struct rtattr *tb[IFLA_MAX+1];
+   str

Re: [PATCH iproute2 WIP] ifstat: use new RTM_GETSTATS api

2016-04-20 Thread Roopa Prabhu

On 4/20/16, 11:53 AM, Stephen Hemminger wrote:
> On Wed, 20 Apr 2016 09:16:15 -0700
> Roopa Prabhu  wrote:
>
>> +int rtnl_wilddump_stats_req_filter(struct rtnl_handle *rth, int family, int 
>> type,
>> +   __u32 filt_mask)
>> +{
>> +struct {
>> +struct nlmsghdr nlh;
>> +struct if_stats_msg ifsm;
>> +} req;
> Please use C99 initialization instead of memset in new code.

yes, ack.
>
>> +int err;
>> +
>> +memset(&req, 0, sizeof(req));
>> +req.nlh.nlmsg_len = sizeof(req);
>> +req.nlh.nlmsg_type = type;
>> +req.nlh.nlmsg_flags = NLM_F_DUMP|NLM_F_REQUEST;
>> +req.nlh.nlmsg_pid = 0;
>> +req.nlh.nlmsg_seq = rth->dump = ++rth->seq;
>> +req.ifsm.family = family;
>> +req.ifsm.filter_mask = filt_mask;
>> +
>> +err = send(rth->fd, (void*)&req, sizeof(req), 0);
>> +
>> +return err;
> Why not just:
> return send(rth->fd, &req, sizoef(req), 0);

yes, i had that initially. and then changed it to add some debugs before 
returning.

this is all WIP. will clean it up.

thanks.

Re: [PATCH net-next v6] rtnetlink: add new RTM_GETSTATS message to dump link stats

2016-04-20 Thread Roopa Prabhu

On 4/20/16, 1:08 PM, David Miller wrote:
> From: Roopa Prabhu 
> Date: Wed, 20 Apr 2016 08:43:43 -0700
>
>> This patch adds a new RTM_GETSTATS message to query link stats via netlink
>> from the kernel. RTM_NEWLINK also dumps stats today, but RTM_NEWLINK
>> returns a lot more than just stats and is expensive in some cases when
>> frequent polling for stats from userspace is a common operation.
> With nla_align_64bit() now working properly, I've applied this and it works
> on sparc64 too.
>
> Thanks!
Thank you.

Re: switchdev fib offload issues

2016-04-21 Thread Roopa Prabhu

On 4/18/16, 10:17 AM, Hannes Frederic Sowa wrote:
> Hi Jiri,
>
> On 18.04.2016 17:47, Jiri Pirko wrote:
>> Proposed solutions (ideas):
>> 1) per-netns. Add a procfs file:
>> /proc/sys/net/ipv4/route/fib_offload_error_policy
>>   with values: "evict" - default, current behaviour
>> "fail" - propagate offload error to user
>> The policy value would be stored in struct net.
> >
>> 2) per-VRF/table
>> When user creates a VRF master, he specifies a table ID
>> this VRF is going to use. I propose to extend this so
>> he can pass a policy ("evict"/"fail").
>> The policy value would be stored in struct fib_table or
>> struct fib6_table. The problem is that vfr only saves
>> table ID, allocates dst but does not actually create
>> table. That might be created later. But I think this
>> could be resolved.
>>
>> 3) per-VFR/master_netdev
>> In this case, the policy would be also set during
>> the creation of VFR master. From user perspective,
>> this looks same as 2)
>> The policy value would be stored in struct net_vrf (vrf private).
>
> I agree that a fail policy is probably the way forward regarding the issues 
> you outlined.
>
> One question though:
>
> Shouldn't the policy by an attribute of the switch, e.g. configurable by 
> devlink (maybe also not the right place)? Not sure how user space can 
> otherwise make correct assumptions about the state of the switch and initiate 
> proper countermeasures (e.g. reducing the smallest prefix length installed to 
> hardware).
>
I am with hannes here. If we introduce a policy, I think it should be global or 
per switchdev instance (possibly via devlink).
This would be a system policy (set via the administrator) and the user or app 
does not need to know about it.

Re: [PATCH] net: ipv6: Delete host routes on an ifdown

2016-04-25 Thread Roopa Prabhu

On 4/25/16, 1:42 PM, David Miller wrote:
> From: David Ahern 
> Date: Mon, 25 Apr 2016 13:40:26 -0600
>
>> It's unfortunate you want to take that action. Last week I came across
>> a prior attempt by Stephen to do this same thing -- keep IPv6
>> addresses. That prior attempt was reverted by commit
>> 73a8bd74e261. Cumulus, Brocade, and others clearly want this
>> capability.
> But nobody has implemented it correctly, it doesn't matter who wants
> the feature.  That's why it keeps getting reverted.
>
> Also, this testing you are talking about should have happened long
> before you submitted that first patch that introduced all of these
> regressions.  My observations tell me that the bulk of the testing
> happened afterwards and that's why all the regressions are popping up
> now.
sorry if it seems that way. But we have been testing several versions of this 
patch
internally. davidA has been throwing it at all of our internal tests just to 
make sure
it gets all the testing it needs before 4.6 goes out. This last fix was 
something
that I think got introduced in one of the later versions during re-implementing
bits of it based on feedback. And one of our new recent tests under stress
caught it and we rushed the fix out.

thanks,
Roopa

Re: [PATCH net-next 2/7] net: rtnetlink: allow only one idx saving stats attribute

2016-04-27 Thread Roopa Prabhu

On 4/27/16, 9:18 AM, Nikolay Aleksandrov wrote:
> We can't allow more than one stats attribute which uses the local idx
> since the result will be a mess. This is a simple check to make sure
> only one is being used at a time. Later when the filter_mask's 32 bits
> are over we can switch to a bitmap.
>
> Signed-off-by: Nikolay Aleksandrov 
> ---
>  include/net/rtnetlink.h |  6 ++
>  net/core/rtnetlink.c| 17 +++--
>  2 files changed, 21 insertions(+), 2 deletions(-)
>
> diff --git a/include/net/rtnetlink.h b/include/net/rtnetlink.h
> index 2f87c1ba13de..3f3b0b1b8722 100644
> --- a/include/net/rtnetlink.h
> +++ b/include/net/rtnetlink.h
> @@ -150,4 +150,10 @@ int rtnl_nla_parse_ifla(struct nlattr **tb, const struct 
> nlattr *head, int len);
>  
>  #define MODULE_ALIAS_RTNL_LINK(kind) MODULE_ALIAS("rtnl-link-" kind)
>  
> +/* at most one attribute which can save a local idx is allowed to be set
> + * IFLA_STATS_IDX_ATTR_MASK has all the idx saving attributes set and is
> + * used to check if more than one is being requested
> + */
> +#define IFLA_STATS_IDX_ATTR_MASK 0
> +
>  #endif
> diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
> index aeb2fa9b1cda..ea03b6cd3d3c 100644
> --- a/net/core/rtnetlink.c
> +++ b/net/core/rtnetlink.c
> @@ -3512,7 +3512,7 @@ static int rtnl_stats_get(struct sk_buff *skb, struct 
> nlmsghdr *nlh)
>   struct if_stats_msg *ifsm;
>   struct net_device *dev = NULL;
>   struct sk_buff *nskb;
> - u32 filter_mask;
> + u32 filter_mask, lidx_filter;
>   int lidx = 0;
>   int err;
>  
> @@ -3529,6 +3529,14 @@ static int rtnl_stats_get(struct sk_buff *skb, struct 
> nlmsghdr *nlh)
>   if (!filter_mask)
>   return -EINVAL;
>  
> + /* only one attribute which can save a local idx is allowed at a time
> +  * even though rtnl_stats_get doesn't save the lidx, we need to be
> +  * consistent with the dump side and error out
> +  */
> + lidx_filter = filter_mask & IFLA_STATS_IDX_ATTR_MASK;
> + if (lidx_filter && !is_power_of_2(lidx_filter))
> + return -EINVAL;
> +
>   nskb = nlmsg_new(if_nlmsg_stats_size(dev, filter_mask), GFP_KERNEL);
>   if (!nskb)
>   return -ENOBUFS;
> @@ -3556,7 +3564,7 @@ static int rtnl_stats_dump(struct sk_buff *skb, struct 
> netlink_callback *cb)
>   struct net_device *dev;
>   struct hlist_head *head;
>   unsigned int flags = NLM_F_MULTI;
> - u32 filter_mask = 0;
> + u32 filter_mask = 0, lidx_filter;
>   int err;
>  
>   s_h = cb->args[0];
> @@ -3570,6 +3578,11 @@ static int rtnl_stats_dump(struct sk_buff *skb, struct 
> netlink_callback *cb)
>   if (!filter_mask)
>   return -EINVAL;
>  
> + /* only one attribute which can save a local idx is allowed at a time */
> + lidx_filter = filter_mask & IFLA_STATS_IDX_ATTR_MASK;
> + if (lidx_filter && !is_power_of_2(lidx_filter))
> + return -EINVAL;
> +
>   
instead of introducing the restriction at this level, is it possible to use two 
args for this
like below and avoid the restriction ?
cb->args[2] = current filter being processed
cb->args[3] = private filter idx (your lidx)

[PATCH iproute2 net-next] ifstat: move to new RTM_GETSTATS api

2016-04-29 Thread Roopa Prabhu

From: Roopa Prabhu 

This patch modifies ifstat to use the new RTM_GETSTATS api
to query stats from the kernel. In the process this also
moves ifstat to use 64 bit stats.

Signed-off-by: Roopa Prabhu 
---
 include/libnetlink.h  |  3 +++
 include/linux/if_link.h   | 22 ++
 include/linux/rtnetlink.h |  5 +
 lib/libnetlink.c  | 25 +
 misc/ifstat.c | 35 +++
 5 files changed, 74 insertions(+), 16 deletions(-)

diff --git a/include/libnetlink.h b/include/libnetlink.h
index 491263f..e623a3c 100644
--- a/include/libnetlink.h
+++ b/include/libnetlink.h
@@ -44,6 +44,9 @@ int rtnl_dump_request(struct rtnl_handle *rth, int type, void 
*req,
 int rtnl_dump_request_n(struct rtnl_handle *rth, struct nlmsghdr *n)
__attribute__((warn_unused_result));
 
+int rtnl_stats_dump_request(struct rtnl_handle *rth, __u32 filt_mask)
+   __attribute__((warn_unused_result));
+
 struct rtnl_ctrl_data {
int nsid;
 };
diff --git a/include/linux/if_link.h b/include/linux/if_link.h
index 6a688e8..68f3270 100644
--- a/include/linux/if_link.h
+++ b/include/linux/if_link.h
@@ -165,6 +165,8 @@ enum {
 #define IFLA_RTA(r)  ((struct rtattr*)(((char*)(r)) + 
NLMSG_ALIGN(sizeof(struct ifinfomsg
 #define IFLA_PAYLOAD(n) NLMSG_PAYLOAD(n,sizeof(struct ifinfomsg))
 
+#define IFLA_RTA_STATS(r)  ((struct rtattr *)(((char *)(r)) + 
NLMSG_ALIGN(sizeof(struct if_stats_msg
+
 enum {
IFLA_INET_UNSPEC,
IFLA_INET_CONF,
@@ -777,4 +779,24 @@ enum {
 
 #define IFLA_HSR_MAX (__IFLA_HSR_MAX - 1)
 
+/* STATS section */
+
+struct if_stats_msg {
+   __u8  family;
+   __u8  pad1;
+   __u16 pad2;
+   __u32 ifindex;
+   __u32 filter_mask;
+};
+
+enum {
+   IFLA_STATS_UNSPEC,
+   IFLA_STATS_LINK_64,
+   __IFLA_STATS_MAX,
+};
+
+#define IFLA_STATS_MAX (__IFLA_STATS_MAX - 1)
+
+#define IFLA_STATS_FILTER_BIT(ATTR)  (1 << (ATTR - 1))
+
 #endif /* _LINUX_IF_LINK_H */
diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h
index 6aaa2a3..e8cdff5 100644
--- a/include/linux/rtnetlink.h
+++ b/include/linux/rtnetlink.h
@@ -139,6 +139,11 @@ enum {
RTM_GETNSID = 90,
 #define RTM_GETNSID RTM_GETNSID
 
+   RTM_NEWSTATS = 92,
+#define RTM_NEWSTATS RTM_NEWSTATS
+   RTM_GETSTATS = 94,
+#define RTM_GETSTATS RTM_GETSTATS
+
__RTM_MAX,
 #define RTM_MAX(((__RTM_MAX + 3) & ~3) - 1)
 };
diff --git a/lib/libnetlink.c b/lib/libnetlink.c
index a90e52c..95f80fc 100644
--- a/lib/libnetlink.c
+++ b/lib/libnetlink.c
@@ -838,3 +838,28 @@ int __parse_rtattr_nested_compat(struct rtattr *tb[], int 
max, struct rtattr *rt
memset(tb, 0, sizeof(struct rtattr *) * (max + 1));
return 0;
 }
+
+int rtnl_stats_dump_request(struct rtnl_handle *rth, __u32 filt_mask)
+{
+   struct {
+   struct nlmsghdr nlh;
+   struct if_stats_msg ifsm;
+   } req = {
+   .nlh.nlmsg_type = RTM_GETSTATS,
+   .nlh.nlmsg_flags = NLM_F_DUMP|NLM_F_REQUEST,
+   .nlh.nlmsg_pid = 0,
+   .ifsm.family = AF_UNSPEC,
+   .ifsm.ifindex = 0,
+   .ifsm.filter_mask = filt_mask,
+   };
+
+   if (!filt_mask) {
+   perror("invalid stats filter mask");
+   return -1;
+   }
+
+   req.nlh.nlmsg_seq = rth->dump = ++rth->seq,
+   req.nlh.nlmsg_len = sizeof(req);
+
+   return send(rth->fd, (void *)&req, sizeof(req), 0);
+}
diff --git a/misc/ifstat.c b/misc/ifstat.c
index abbb4e7..bf9b9fa 100644
--- a/misc/ifstat.c
+++ b/misc/ifstat.c
@@ -35,6 +35,8 @@
 
 #include 
 
+#include "utils.h"
+
 int dump_zeros;
 int reset_history;
 int ignore_history;
@@ -49,6 +51,8 @@ double W;
 char **patterns;
 int npatterns;
 
+struct rtnl_handle rth;
+
 char info_source[128];
 int source_mismatch;
 
@@ -58,9 +62,9 @@ struct ifstat_ent {
struct ifstat_ent   *next;
char*name;
int ifindex;
-   unsigned long long  val[MAXS];
+   __u64   val[MAXS];
double  rate[MAXS];
-   __u32   ival[MAXS];
+   __u64   ival[MAXS];
 };
 
 static const char *stats[MAXS] = {
@@ -109,32 +113,29 @@ static int match(const char *id)
 static int get_nlmsg(const struct sockaddr_nl *who,
 struct nlmsghdr *m, void *arg)
 {
-   struct ifinfomsg *ifi = NLMSG_DATA(m);
-   struct rtattr *tb[IFLA_MAX+1];
+   struct if_stats_msg *ifsm = NLMSG_DATA(m);
+   struct rtattr *tb[IFLA_STATS_MAX+1];
int len = m->nlmsg_len;
struct ifstat_ent *n;
int i;
 
-   if (m->nlmsg_type != RTM_NEWLINK)
+   if (m->nlmsg_type != RTM_NEWSTATS)
return 0;
 
-   len -= NLMSG_LENGTH(sizeof(*ifi));
+   len -= NLMSG_LENGTH(sizeof(

Re: [PATCH iproute2 net-next] ifstat: move to new RTM_GETSTATS api

2016-04-30 Thread Roopa Prabhu

On 4/30/16, 3:21 AM, Jamal Hadi Salim wrote:
> On 16-04-30 02:41 AM, Roopa Prabhu wrote:
>> From: Roopa Prabhu 
>>
>> This patch modifies ifstat to use the new RTM_GETSTATS api
>> to query stats from the kernel. In the process this also
>> moves ifstat to use 64 bit stats.
>
> Breaks old kernels? May need to keep backward compat of
> RTM_NEWLINK and even new main struct for GETSTATS.
yes, i was wondering about that. v2 coming. If GETSTATS fails, I will fallback 
to RTM_NEWLINK.

thanks!

[PATCH net-next v7] mpls: support for dead routes

2015-12-01 Thread Roopa Prabhu

From: Roopa Prabhu 

Adds support for RTNH_F_DEAD and RTNH_F_LINKDOWN flags on mpls
routes due to link events. Also adds code to ignore dead
routes during route selection.

Unlike ip routes, mpls routes are not deleted when the route goes
dead. This is current mpls behaviour and this patch does not change
that. With this patch however, routes will be marked dead.
dead routes are not notified to userspace (this is consistent with ipv4
routes).

dead routes:
---
$ip -f mpls route show
100
nexthop as to 200 via inet 10.1.1.2  dev swp1
nexthop as to 700 via inet 10.1.1.6  dev swp2

$ip link set dev swp1 down

$ip link show dev swp1
4: swp1:  mtu 1500 qdisc pfifo_fast state DOWN mode
DEFAULT group default qlen 1000
link/ether 00:02:00:00:00:01 brd ff:ff:ff:ff:ff:ff

$ip -f mpls route show
100
nexthop as to 200 via inet 10.1.1.2  dev swp1 dead linkdown
nexthop as to 700 via inet 10.1.1.6  dev swp2

linkdown routes:

$ip -f mpls route show
100
nexthop as to 200 via inet 10.1.1.2  dev swp1
nexthop as to 700 via inet 10.1.1.6  dev swp2

$ip link show dev swp1
4: swp1:  mtu 1500 qdisc pfifo_fast
state UP mode DEFAULT group default qlen 1000
link/ether 00:02:00:00:00:01 brd ff:ff:ff:ff:ff:ff

/* carrier goes down */
$ip link show dev swp1
4: swp1:  mtu 1500 qdisc pfifo_fast
state DOWN mode DEFAULT group default qlen 1000
link/ether 00:02:00:00:00:01 brd ff:ff:ff:ff:ff:ff

$ip -f mpls route show
100
nexthop as to 200 via inet 10.1.1.2  dev swp1 linkdown
nexthop as to 700 via inet 10.1.1.6  dev swp2

Signed-off-by: Roopa Prabhu 
---
RFC to v1:
Addressed a few comments from Eric and Robert:
- remove support for weighted nexthops
- Use rt_nhn_alive in the rt structure to keep count of alive
routes.
What i have not done is: sort nexthops on link events.
I am not comfortable recreating or sorting nexthops on
every carrier change. This leaves scope for optimizing in the
future

v1 to v2:
Fix dead nexthop checks as suggested by dave

v2 to v3:
Fix duplicated argument reported by kbuild test robot

v3 - v4:
- removed per route rt_flags and derive it from the nh_flags during 
dumps
- use kmemdup to make a copy of the route during route updates
  due to link events

v4 -v5
- if kmemdup fails, modify the original route in place. This is a
corner case and only side effect is that in the remote case
of kmemdup failure, the changes will not be atomically visible
to datapath.
- replace for_nexthops with change_nexthops in a bunch of places.
- fix indent

v5 - v6
- update routes in place in mpls netdev notifier handlers. 
the additional kmemdup complexity and failure path recovery
does not seem necessary to support the transient atomic update
case

v6 - v7
- Use ACCESS_ONCE when accessing rt_nhn_alive from notifiers and
  datapath as suggested by Robert

 net/mpls/af_mpls.c  | 185 
 net/mpls/internal.h |   2 +
 2 files changed, 159 insertions(+), 28 deletions(-)

diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index c70d750..4b3b9b3 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -96,22 +96,15 @@ bool mpls_pkt_too_big(const struct sk_buff *skb, unsigned 
int mtu)
 }
 EXPORT_SYMBOL_GPL(mpls_pkt_too_big);
 
-static struct mpls_nh *mpls_select_multipath(struct mpls_route *rt,
-struct sk_buff *skb, bool bos)
+static u32 mpls_multipath_hash(struct mpls_route *rt,
+  struct sk_buff *skb, bool bos)
 {
struct mpls_entry_decoded dec;
struct mpls_shim_hdr *hdr;
bool eli_seen = false;
int label_index;
-   int nh_index = 0;
u32 hash = 0;
 
-   /* No need to look further into packet if there's only
-* one path
-*/
-   if (rt->rt_nhn == 1)
-   goto out;
-
for (label_index = 0; label_index < MAX_MP_SELECT_LABELS && !bos;
 label_index++) {
if (!pskb_may_pull(skb, sizeof(*hdr) * label_index))
@@ -165,7 +158,38 @@ static struct mpls_nh *mpls_select_multipath(struct 
mpls_route *rt,
}
}
 
-   nh_index = hash % rt->rt_nhn;
+   return hash;
+}
+
+static struct mpls_nh *mpls_select_multipath(struct mpls_route *rt,
+struct sk_buff *skb, bool bos)
+{
+   int alive = ACCESS_ONCE(rt->rt_nhn_alive);
+   u32 hash = 0;
+   int nh_index = 0;
+   int n = 0;
+
+   /* No need to look further into packet if there's only
+* one path
+*/
+   if (rt->rt_nhn == 1)
+   goto out;
+
+   if (alive <= 0)
+   return NULL;
+
+   hash = mpls_multipath_hash(rt, skb, bos);
+   nh_ind

[PATCH net-next] mpls_iptunnel: add static qualifier to mpls_output

2015-12-09 Thread Roopa Prabhu

From: Roopa Prabhu 

This gets rid of the following compile warn:
net/mpls/mpls_iptunnel.c:40:5: warning: no previous prototype for
mpls_output [-Wmissing-prototypes]

Signed-off-by: Roopa Prabhu 
---
 net/mpls/mpls_iptunnel.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/mpls/mpls_iptunnel.c b/net/mpls/mpls_iptunnel.c
index 67591ae..cdd01e6 100644
--- a/net/mpls/mpls_iptunnel.c
+++ b/net/mpls/mpls_iptunnel.c
@@ -37,7 +37,7 @@ static unsigned int mpls_encap_size(struct 
mpls_iptunnel_encap *en)
return en->labels * sizeof(struct mpls_shim_hdr);
 }
 
-int mpls_output(struct net *net, struct sock *sk, struct sk_buff *skb)
+static int mpls_output(struct net *net, struct sock *sk, struct sk_buff *skb)
 {
struct mpls_iptunnel_encap *tun_encap_info;
struct mpls_shim_hdr *hdr;
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch net-next v4 0/4] return offloaded stats as default and expose original sw stats

2016-06-19 Thread Roopa Prabhu

On Fri, Jun 17, 2016 at 7:05 AM, Jiri Pirko  wrote:
> Fri, Jun 17, 2016 at 03:48:35PM CEST, d...@cumulusnetworks.com wrote:
>>On 6/17/16 2:24 AM, Jiri Pirko wrote:
>>>
>>>The problem we try to handle is different, it's about offloaded
>>>forwarded packets which are not seen by kernel. Let me try to draw it :)
>>>
>>>port1   port2 (HW stats are counted here)
>>>  \  /
>>>   \/
>>>\  /
>>> --(A) ASIC --(B)--
>>>|
>>>   (C)
>>>|
>>>   CPU (SW stats are counted here)
>>>
>>>
>>>Now we have couple of flows for TX and RX (direction does not matter here):
>>>
>>>1) port1->A->ASIC->C->CPU
>>>
>>>   For this flow, HW and SW stats are equal.
>>>
>>>2) port1->A->ASIC->C->CPU->C->ASIC->B->port2
>>>
>>>   For this flow, HW and SW stats are equal.
>>>
>>>3) port1->A->ASIC->B->port2
>>>
>>>   For this flow, SW stats are 0.
>>>
>>>The purpose of this patchset is to provide facility for user to
>>>find out the difference between flows 1+2 and 3. In other words, user
>>>will be able to see the statistics for his slow-path (through kernel).
>>>
>>>Also, as a default the accumulated stats (HW) will be exposed to user
>>>so the userspace apps can react properly.
>>>
>>
>>You no longer agree with this discussion?
>>  http://comments.gmane.org/gmane.linux.network/346740
>>
>>Essentially netdevice stats show counters for packets punted to the cpu and
>>ethool -S shows h/w stats. This patch set seems to invert that.
>
> That is problematic. Existing apps depend on rtnetlink stats. But if we
> don't count offloaded forwarded packets, the apps don't see anything.
> Therefore I believe that this patchset approach is better. The existing
> apps continue to work and future apps can use newly introduces sw_stats
> to query slowpath traffic. Makes sense to me.
>

Apps only care about stats. they don't care about sw vs hardware
stats. what apps are these ?.
For debugging, I agree it would be useful, but thats why we have
always had ethtool stats which
the driver can break down and display. And in all my patches about the
new stats api, i have indicated
that we will migrate the existing ethtool stats to a new netlink
attribute in the new stats api.

Re: [patch net-next v4 0/4] return offloaded stats as default and expose original sw stats

2016-06-19 Thread Roopa Prabhu

On Fri, Jun 17, 2016 at 7:54 AM, Jamal Hadi Salim  wrote:
> On 16-06-17 10:05 AM, Jiri Pirko wrote:
>>
>> Fri, Jun 17, 2016 at 03:48:35PM CEST, d...@cumulusnetworks.com wrote:
>>>
>>> On 6/17/16 2:24 AM, Jiri Pirko wrote:

>
>>
>> That is problematic. Existing apps depend on rtnetlink stats. But if we
>> don't count offloaded forwarded packets, the apps don't see anything.
>> Therefore I believe that this patchset approach is better. The existing
>> apps continue to work and future apps can use newly introduces sw_stats
>> to query slowpath traffic. Makes sense to me.
>>
>
> I agree with Jiri. It is a bad idea to depend on ethtool for any of
> this stuff.

The concern should not be that it is an ethtool api.
In all previous discussions on this patchset and also my
stats api patches, i have indicated that we have to move all stats
in one place, so naturally, ethtool stats should move eventually to the
stats api as a new nested netlink attribute. I think i called it
IFLA_STATS_LINK_HW  (or something like that)...
and this nested attribute should provide the flexibility and extensibility
of the current ethtool stats api.

> Is there a way we can tag netlink stats instead
> to indicate they are hardware or software?
> We already have a use case with the tc where someone could get/set
> hardware and/or software.
>
> cheers,
> jamal

Re: [patch net-next v4 0/4] return offloaded stats as default and expose original sw stats

2016-06-19 Thread Roopa Prabhu

On Fri, Jun 17, 2016 at 10:12 AM, Florian Fainelli  wrote:
> On 06/17/2016 08:42 AM, Jiri Pirko wrote:
>> Fri, Jun 17, 2016 at 05:35:53PM CEST, d...@cumulusnetworks.com wrote:
>>> On 6/17/16 8:54 AM, Jamal Hadi Salim wrote:
 On 16-06-17 10:05 AM, Jiri Pirko wrote:
> Fri, Jun 17, 2016 at 03:48:35PM CEST, d...@cumulusnetworks.com wrote:
>> On 6/17/16 2:24 AM, Jiri Pirko wrote:
>>>

>
> That is problematic. Existing apps depend on rtnetlink stats. But if we
> don't count offloaded forwarded packets, the apps don't see anything.
> Therefore I believe that this patchset approach is better. The existing
> apps continue to work and future apps can use newly introduces sw_stats
> to query slowpath traffic. Makes sense to me.
>

 I agree with Jiri. It is a bad idea to depend on ethtool for any of
 this stuff. Is there a way we can tag netlink stats instead
 to indicate they are hardware or software?
>>>
>>> Right, old API but the key here is that low level h/w stats are returned by 
>>> a
>>> different API.
>>>
>>> By default ip, ifconfig, snmpd, etc all continue to get traditional S/W 
>>> stats
>>> - counters as seen by the CPU.
>>
>> Yep. And I believe that for offloaded forwarding, this tools should see
>> hw counters, as they show what is going on in real.
>
> If your NIC is offloading packets today, these tools typically won't see
> these stats, but ethtool -S likely will report what is going on under
> the hood.
>
> Do we actually need to tell apart SW maintained from HW maintained
> stats, or at the end all that matters is just, as DaveM pointed out,
> getting the information, and in the case of an Ethernet switch, return
> HW stats by default and supplement with SW stats whenever we have them,
> all in the same namespace?
> --

I have also mentioned this before, the default api must provide
accumulated (hw and sw) stats...,
because this is the api that the user queries on an interface.
For advanced debugging, people do want a break down and thats what
traditionally ethtool has provided
and the new stats api should eventually include support for ethtool like stats.

Re: rcu locking issue in mpls output code?

2016-06-20 Thread Roopa Prabhu

On Mon, Jun 20, 2016 at 8:19 AM, David Ahern  wrote:
> On 6/20/16 12:30 AM, Lennert Buytenhek wrote:
>>
>> On Sun, Jun 19, 2016 at 08:19:20PM -0600, David Ahern wrote:
>>
>>>> diff --git a/net/mpls/mpls_iptunnel.c b/net/mpls/mpls_iptunnel.c
>>>> index fb31aa8..802956b 100644
>>>> --- a/net/mpls/mpls_iptunnel.c
>>>> +++ b/net/mpls/mpls_iptunnel.c
>>>> @@ -105,12 +105,15 @@ static int mpls_output(struct net *net, struct
>>>> sock *sk, struct sk_buff *skb)
>>>> bos = false;
>>>> }
>>>>
>>>> +   rcu_read_lock_bh();
>>>> if (rt)
>>>> err = neigh_xmit(NEIGH_ARP_TABLE, out_dev,
>>>> &rt->rt_gateway,
>>>>  skb);
>>>> else if (rt6)
>>>> err = neigh_xmit(NEIGH_ND_TABLE, out_dev,
>>>> &rt6->rt6i_gateway,
>>>>  skb);
>>>> +   rcu_read_unlock_bh();
>>>> +
>>>> if (err)
>>>> net_dbg_ratelimited("%s: packet transmission failed:
>>>> %d\n",
>>>> __func__, err);
>>>>
>>>
>>> I think those need to be added to neigh_xmit in the
>>>
>>> if (likely(index < NEIGH_NR_TABLES)) {
>>>
>>> }
>>
>>
>> That'll force callers that don't need the extra protection (i.e.
>> mpls_forward(), since that always runs from softirq and it's enough
>> to protect the neigh state with rcu_read_lock() from softirq and we're
>> already running under rcu_read_lock() when we get to neigh_xmit()) to
>> eat the useless overhead of an extra rcu_read_{,un}lock_bh() pair, but
>> sure, functionally that's correct, I think, and in my workload I don't
>> care about MPLS forwarding performance anyway. ;-)
>
>
> __neigh_lookup_noref expects bh level protection. Since the if block in
> neigh_xmit requires the locking seems like this the appropriate place for
> it.
>
>>
>> Want me to send a patch moving it to neigh_xmit() ?
>
>
> Roopa/Robert: agree?
>

yes, seems like an appropriate place for it.  provided it does not add
unnecessary overhead for others.
But then neigh_xmit seems to be only called from mpls_output and mpls_forward.

thanks!

[PATCH iproute2 net-next v3 5/5] bridge: update man page

2016-06-20 Thread Roopa Prabhu

From: Roopa Prabhu 

Signed-off-by: Roopa Prabhu 
---
 man/man8/bridge.8 | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/man/man8/bridge.8 b/man/man8/bridge.8
index 08e8a5b..abaee63 100644
--- a/man/man8/bridge.8
+++ b/man/man8/bridge.8
@@ -20,8 +20,9 @@ bridge \- show / manipulate bridge addresses and devices
 .IR OPTIONS " := { "
 \fB\-V\fR[\fIersion\fR] |
 \fB\-s\fR[\fItatistics\fR] |
-\fB\-n\fR[\fIetns\fR] name }
-\fB\-b\fR[\fIatch\fR] filename }
+\fB\-n\fR[\fIetns\fR] name |
+\fB\-b\fR[\fIatch\fR] filename |
+\fB\-j\fR[\fIson\fR] }
 
 .ti -8
 .BR "bridge link set"
@@ -153,6 +154,10 @@ Don't terminate bridge command on errors in batch mode.
 If there were any errors during execution of the commands, the application
 return code will be non zero.
 
+.TP
+.BR "\-json"
+Display results in JSON format. Currently available for vlan and fdb.
+
 .SH BRIDGE - COMMAND SYNTAX
 
 .SS
-- 
1.9.1

[PATCH iproute2 net-next v3 3/5] bridge: add json support for bridge fdb show

2016-06-20 Thread Roopa Prabhu

From: Anuradha Karuppiah 

Sample output:
$bridge -j fdb show
[{
"mac": "44:38:39:00:69:88",
"dev": "swp2s0",
"vlan": 2,
"master": "br0",
"state": "permanent"
},{
"mac": "00:02:00:00:00:01",
"dev": "swp2s0",
"vlan": 2,
"master": "br0"
},{
"mac": "00:02:00:00:00:02",
"dev": "swp2s1",
"vlan": 2,
"master": "br0"
},{
"mac": "44:38:39:00:69:89",
"dev": "swp2s1",
"master": "br0",
"state": "permanent"
},{
    "mac": "44:38:39:00:69:89",
"dev": "swp2s1",
"vlan": 2,
"master": "br0",
"state": "permanent"
},{
"mac": "44:38:39:00:69:88",
"dev": "br0",
"master": "br0",
"state": "permanent"
}
]

Signed-off-by: Anuradha Karuppiah 
Signed-off-by: Roopa Prabhu 
---
 bridge/fdb.c | 207 ++-
 1 file changed, 164 insertions(+), 43 deletions(-)

diff --git a/bridge/fdb.c b/bridge/fdb.c
index be849f9..c2bfeb2 100644
--- a/bridge/fdb.c
+++ b/bridge/fdb.c
@@ -21,6 +21,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #include "libnetlink.h"
 #include "br_common.h"
@@ -29,6 +31,8 @@
 
 static unsigned int filter_index, filter_vlan;
 
+json_writer_t *jw_global;
+
 static void usage(void)
 {
fprintf(stderr, "Usage: bridge fdb { add | append | del | replace } 
ADDR dev DEV\n"
@@ -59,6 +63,15 @@ static const char *state_n2a(unsigned int s)
return buf;
 }
 
+static void start_json_fdb_flags_array(bool *fdb_flags)
+{
+   if (*fdb_flags)
+   return;
+   jsonw_name(jw_global, "flags");
+   jsonw_start_array(jw_global);
+   *fdb_flags = true;
+}
+
 int print_fdb(const struct sockaddr_nl *who, struct nlmsghdr *n, void *arg)
 {
FILE *fp = arg;
@@ -66,11 +79,12 @@ int print_fdb(const struct sockaddr_nl *who, struct 
nlmsghdr *n, void *arg)
int len = n->nlmsg_len;
struct rtattr *tb[NDA_MAX+1];
__u16 vid = 0;
+   bool fdb_flags = false;
+   const char *state_s;
 
if (n->nlmsg_type != RTM_NEWNEIGH && n->nlmsg_type != RTM_DELNEIGH) {
fprintf(stderr, "Not RTM_NEWNEIGH: %08x %08x %08x\n",
n->nlmsg_len, n->nlmsg_type, n->nlmsg_flags);
-
return 0;
}
 
@@ -86,6 +100,11 @@ int print_fdb(const struct sockaddr_nl *who, struct 
nlmsghdr *n, void *arg)
if (filter_index && filter_index != r->ndm_ifindex)
return 0;
 
+   if (jw_global) {
+   jsonw_pretty(jw_global, 1);
+   jsonw_start_object(jw_global);
+   }
+
parse_rtattr(tb, NDA_MAX, NDA_RTA(r),
 n->nlmsg_len - NLMSG_LENGTH(sizeof(*r)));
 
@@ -95,40 +114,75 @@ int print_fdb(const struct sockaddr_nl *who, struct 
nlmsghdr *n, void *arg)
if (filter_vlan && filter_vlan != vid)
return 0;
 
-   if (n->nlmsg_type == RTM_DELNEIGH)
-   fprintf(fp, "Deleted ");
+   if (n->nlmsg_type == RTM_DELNEIGH) {
+   if (jw_global)
+   jsonw_string_field(jw_global, "opCode", "deleted");
+   else
+   fprintf(fp, "Deleted ");
+   }
 
if (tb[NDA_LLADDR]) {
SPRINT_BUF(b1);
-   fprintf(fp, "%s ",
-   ll_addr_n2a(RTA_DATA(tb[NDA_LLADDR]),
-   RTA_PAYLOAD(tb[NDA_LLADDR]),
-   ll_index_to_type(r->ndm_ifindex),
-   b1, sizeof(b1)));
+   ll_addr_n2a(RTA_DATA(tb[NDA_LLADDR]),
+   RTA_PAYLOAD(tb[NDA_LLADDR]),
+   ll_index_to_type(r->ndm_ifindex),
+   b1, sizeof(b1));
+   if (jw_global)
+   jsonw_string_field(jw_global, "mac", b1);
+   else
+   fprintf(fp, "%s ", b1);
}
 
-   if (!filter_index && r->ndm_ifindex)
-   fprintf(fp, "dev %s ", ll_index_to_name(r->ndm_ifindex));
+   if (!filter_index && r->ndm_ifindex) {
+   if (jw_global)
+   jsonw_string_fie

[PATCH iproute2 net-next v3 4/5] bridge: add json schema for bridge fdb show

2016-06-20 Thread Roopa Prabhu

From: Anuradha Karuppiah 

we think storing the schema file for the json
format will be useful.

Signed-off-by: Anuradha Karuppiah 
---
 schema/bridge_fdb_schema.json | 62 +++
 1 file changed, 62 insertions(+)
 create mode 100644 schema/bridge_fdb_schema.json

diff --git a/schema/bridge_fdb_schema.json b/schema/bridge_fdb_schema.json
new file mode 100644
index 000..3e5be8d
--- /dev/null
+++ b/schema/bridge_fdb_schema.json
@@ -0,0 +1,62 @@
+{
+"$schema": "http://json-schema.org/draft-04/schema#";,
+"description": "bridge fdb show",
+"type": "array",
+"items": {
+"type": "object",
+"properties": {
+"dev": {
+"type": "string"
+},
+"dst": {
+"description" : "host name or ip address",
+"type": "string"
+},
+"flags": {
+"type": "array",
+"items": {
+"enum": ["self", "master", "router", "offload"]
+},
+"uniqueItems": true
+},
+"linkNetNsId": {
+"type": "integer"
+},
+"mac": {
+"type": "string"
+},
+"master": {
+"type": "string"
+},
+"opCode": {
+"description" : "used to indicate fdb entry del",
+"enum": ["deleted"]
+},
+"port": {
+"type": "integer"
+},
+"state": {
+"description" : "permanent, static, stale, state=#x",
+"type": "string"
+},
+"updated": {
+"type": "integer"
+},
+"used": {
+"type": "integer"
+},
+"viaIf": {
+"type": "string"
+},
+"viaIfIndex": {
+"type": "integer"
+},
+"vlan": {
+"type": "integer"
+},
+"vni": {
+"type": "integer"
+}
+}
+}
+}
-- 
1.9.1

[PATCH iproute2 net-next v3 0/5] bridge: json support for fdb and vlan show

2016-06-20 Thread Roopa Prabhu

From: Roopa Prabhu 

This patch series adds json support for a few bridge show commands.
We plan to follow up with json support for additional commands soon.

Anuradha Karuppiah (3):
  json_writer: allow base json data type to be array or object
  bridge: add json support for bridge fdb show
  bridge: add json schema for bridge fdb show

Roopa Prabhu (2):
  bridge: add json support for bridge vlan show
  bridge: update man page

v2 - change vlan flags to an array as suggested by toshiaki

v3 - no change. resubmitting as requested by stephen

 bridge/br_common.h|   1 +
 bridge/bridge.c   |   3 +
 bridge/fdb.c  | 201 +-
 bridge/vlan.c |  93 ++-
 include/json_writer.h |   5 +-
 lib/json_writer.c |  39 +++-
 man/man8/bridge.8 |   9 +-
 misc/ifstat.c |   6 +-
 misc/lnstat.c |   2 +-
 misc/nstat.c  |   4 +-
 schema/bridge_fdb_schema.json |  62 +
 11 files changed, 365 insertions(+), 60 deletions(-)
 create mode 100644 schema/bridge_fdb_schema.json

-- 
1.9.1

[PATCH iproute2 net-next v3 2/5] bridge: add json support for bridge vlan show

2016-06-20 Thread Roopa Prabhu

From: Roopa Prabhu 

$bridge -c vlan show
portvlan ids
swp1 1 PVID Egress Untagged
 10-13

swp2 1 PVID Egress Untagged
 10-13

br0  1 PVID Egress Untagged

$bridge  -json vlan show
{
"swp1": [{
"vlan": 1,
"flags": ["PVID","Egress Untagged"
]
},{
"vlan": 10
},{
"vlan": 11
},{
"vlan": 12
},{
"vlan": 13
}
],
"swp2": [{
"vlan": 1,
"flags": ["PVID","Egress Untagged"
]
},{
"vlan": 10
},{
"vlan": 11
},{
"vlan": 12
},{
"vlan": 13
}
],
"br0": [{
"vlan": 1,
"flags": ["PVID","Egress Untagged"
]
}
]
}

$bridge -c -json vlan show
{
"swp1": [{
"vlan": 1,
"flags": ["PVID","Egress Untagged"
]
},{
"vlan": 10,
"vlanEnd": 13
}
],
"swp2": [{
"vlan": 1,
"flags": ["PVID","Egress Untagged"
]
},{
"vlan": 10,
"vlanEnd": 13
}
],
"br0": [{
"vlan": 1,
"flags": ["PVID","Egress Untagged"
]
}
]
}

Signed-off-by: Roopa Prabhu 
---
 bridge/br_common.h |   1 +
 bridge/bridge.c|   5 ++-
 bridge/vlan.c  | 106 ++---
 3 files changed, 97 insertions(+), 15 deletions(-)

diff --git a/bridge/br_common.h b/bridge/br_common.h
index 5ea45c9..c649e7d 100644
--- a/bridge/br_common.h
+++ b/bridge/br_common.h
@@ -23,4 +23,5 @@ extern int show_stats;
 extern int show_details;
 extern int timestamp;
 extern int compress_vlans;
+extern int json_output;
 extern struct rtnl_handle rth;
diff --git a/bridge/bridge.c b/bridge/bridge.c
index 72f153f..5ff038d 100644
--- a/bridge/bridge.c
+++ b/bridge/bridge.c
@@ -23,6 +23,7 @@ int oneline;
 int show_stats;
 int show_details;
 int compress_vlans;
+int json_output;
 int timestamp;
 char *batch_file;
 int force;
@@ -38,7 +39,7 @@ static void usage(void)
 "where OBJECT := { link | fdb | mdb | vlan | monitor }\n"
 "  OPTIONS := { -V[ersion] | -s[tatistics] | -d[etails] |\n"
 "   -o[neline] | -t[imestamp] | -n[etns] name |\n"
-"   -c[ompressvlans] }\n");
+"   -c[ompressvlans] -j{son} }\n");
exit(-1);
 }
 
@@ -173,6 +174,8 @@ main(int argc, char **argv)
++compress_vlans;
} else if (matches(opt, "-force") == 0) {
++force;
+   } else if (matches(opt, "-json") == 0) {
+   ++json_output;
} else if (matches(opt, "-batch") == 0) {
argc--;
argv++;
diff --git a/bridge/vlan.c b/bridge/vlan.c
index 717025a..fbf14c8 100644
--- a/bridge/vlan.c
+++ b/bridge/vlan.c
@@ -7,6 +7,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include "libnetlink.h"
@@ -15,6 +16,8 @@
 
 static unsigned int filter_index, filter_vlan;
 
+json_writer_t *jw_global = NULL;
+
 static void usage(void)
 {
fprintf(stderr, "Usage: bridge vlan { add | del } vid VLAN_ID dev DEV [ 
pvid] [ untagged ]\n");
@@ -158,6 +161,28 @@ static int filter_vlan_check(struct bridge_vlan_info 
*vinfo)
return 1;
 }
 
+static void print_vlan_port(FILE *fp, int ifi_index)
+{
+   if (jw_global) {
+   jsonw_pretty(jw_global, 1);
+   jsonw_name(jw_global,
+  ll_index_to_name(ifi_index));
+   jsonw_start_array(jw_global);
+   } else {
+   fprintf(fp, "%s",
+   ll_index_to_name(ifi_index));
+   }
+}
+
+static void start_json_vlan_flags_array(bool *vlan_flags)
+{
+   if (*vlan_flags)
+   return;
+   jsonw_name(jw_global, "flags");
+   jsonw_start_array(jw_global);
+   *vlan_flags = true;
+}
+
 static int print_vlan(const struct sockaddr_nl *who,
  struct nlmsghdr *n,
  void *arg)
@@ -166,6 +191,8 @@ static int print_vlan(const struct sockaddr_nl *who,
struct ifinfomsg *ifm = NLMSG_DATA(n);
int len = n->nlmsg_len;
struct rtattr *tb[IFLA_MAX+1];
+   bool vlan_flags;
+   char flags[80]

[PATCH iproute2 net-next v3 1/5] json_writer: allow base json data type to be array or object

2016-06-20 Thread Roopa Prabhu

From: Anuradha Karuppiah 

This patch adds a type qualifier to json_writer. Type can be a
json object or array. This can be extended to other types like
json-string, json-number etc in the future.

Signed-off-by: Anuradha Karuppiah 
---
 include/json_writer.h |  5 +++--
 lib/json_writer.c | 39 +++
 misc/ifstat.c |  6 +++---
 misc/lnstat.c |  2 +-
 misc/nstat.c  |  4 ++--
 5 files changed, 44 insertions(+), 12 deletions(-)

diff --git a/include/json_writer.h b/include/json_writer.h
index ab9a008..e04a40a 100644
--- a/include/json_writer.h
+++ b/include/json_writer.h
@@ -21,8 +21,9 @@
 /* Opaque class structure */
 typedef struct json_writer json_writer_t;
 
-/* Create a new JSON stream */
-json_writer_t *jsonw_new(FILE *f);
+/* Create a new JSON stream with data type */
+json_writer_t *jsonw_new_object(FILE *f);
+json_writer_t *jsonw_new_array(FILE *f);
 /* End output to JSON stream */
 void jsonw_destroy(json_writer_t **self_p);
 
diff --git a/lib/json_writer.c b/lib/json_writer.c
index 2af16e1..420cd87 100644
--- a/lib/json_writer.c
+++ b/lib/json_writer.c
@@ -22,11 +22,17 @@
 
 #include "json_writer.h"
 
+enum jsonw_data_type {
+   JSONW_TYPE_OBJECT,
+   JSONW_TYPE_ARRAY
+};
+
 struct json_writer {
FILE*out;   /* output file */
unsigneddepth;  /* nesting */
boolpretty; /* optional whitepace */
charsep;/* either nul or comma */
+   int type;   /* currently either object or array */
 };
 
 /* indentation for pretty print */
@@ -94,7 +100,7 @@ static void jsonw_puts(json_writer_t *self, const char *str)
 }
 
 /* Create a new JSON stream */
-json_writer_t *jsonw_new(FILE *f)
+static json_writer_t *jsonw_new(FILE *f, int type)
 {
json_writer_t *self = malloc(sizeof(*self));
if (self) {
@@ -102,11 +108,29 @@ json_writer_t *jsonw_new(FILE *f)
self->depth = 0;
self->pretty = false;
self->sep = '\0';
-   putc('{', self->out);
+   self->type = type;
+   switch (self->type) {
+   case JSONW_TYPE_OBJECT:
+   putc('{', self->out);
+   break;
+   case JSONW_TYPE_ARRAY:
+   putc('[', self->out);
+   break;
+   }
}
return self;
 }
 
+json_writer_t *jsonw_new_object(FILE *f)
+{
+   return jsonw_new(f, JSONW_TYPE_OBJECT);
+}
+
+json_writer_t *jsonw_new_array(FILE *f)
+{
+   return jsonw_new(f, JSONW_TYPE_ARRAY);
+}
+
 /* End output to JSON stream */
 void jsonw_destroy(json_writer_t **self_p)
 {
@@ -114,7 +138,14 @@ void jsonw_destroy(json_writer_t **self_p)
 
assert(self->depth == 0);
jsonw_eol(self);
-   fputs("}\n", self->out);
+   switch (self->type) {
+   case JSONW_TYPE_OBJECT:
+   fputs("}\n", self->out);
+   break;
+   case JSONW_TYPE_ARRAY:
+   fputs("]\n", self->out);
+   break;
+   }
fflush(self->out);
free(self);
*self_p = NULL;
@@ -267,7 +298,7 @@ void jsonw_null_field(json_writer_t *self, const char *prop)
 #ifdef TEST
 int main(int argc, char **argv)
 {
-   json_writer_t *wr = jsonw_new(stdout);
+   json_writer_t *wr = jsonw_new_object(stdout);
 
jsonw_pretty(wr, true);
jsonw_name(wr, "Vyatta");
diff --git a/misc/ifstat.c b/misc/ifstat.c
index abbb4e7..29aa63c 100644
--- a/misc/ifstat.c
+++ b/misc/ifstat.c
@@ -240,7 +240,7 @@ static void load_raw_table(FILE *fp)
 
 static void dump_raw_db(FILE *fp, int to_hist)
 {
-   json_writer_t *jw = json_output ? jsonw_new(fp) : NULL;
+   json_writer_t *jw = json_output ? jsonw_new_object(fp) : NULL;
struct ifstat_ent *n, *h;
 
h = hist_db;
@@ -447,7 +447,7 @@ static void print_one_if(FILE *fp, const struct ifstat_ent 
*n,
 
 static void dump_kern_db(FILE *fp)
 {
-   json_writer_t *jw = json_output ? jsonw_new(fp) : NULL;
+   json_writer_t *jw = json_output ? jsonw_new_object(fp) : NULL;
struct ifstat_ent *n;
 
if (jw) {
@@ -473,7 +473,7 @@ static void dump_kern_db(FILE *fp)
 static void dump_incr_db(FILE *fp)
 {
struct ifstat_ent *n, *h;
-   json_writer_t *jw = json_output ? jsonw_new(fp) : NULL;
+   json_writer_t *jw = json_output ? jsonw_new_object(fp) : NULL;
 
h = hist_db;
if (jw) {
diff --git a/misc/lnstat.c b/misc/lnstat.c
index 659a01b..2988e9e 100644
--- a/misc/lnstat.c
+++ b/misc/lnstat.c
@@ -110,7 +110,7 @@ static void print_line(FILE *of, const struct lnstat_file 
*lnstat_files,
 static void print_json(FILE *of, const struct lnstat_file *lnstat_files,
   const struct field_params *fp)
 {
-   json_writer_t *jw = jsonw_new(of);
+   json_writer_t *jw = jsonw_new_object(of);
int i;
 
jsonw_start_object(jw);
di

[PATCH iproute2 net-next v4 0/5] bridge: json support for fdb and vlan show

2016-06-22 Thread Roopa Prabhu

From: Roopa Prabhu 

This patch series adds json support for a few bridge show commands.
We plan to follow up with json support for additional commands soon.

Anuradha Karuppiah (3):
  json_writer: Removed automatic json-object type from the constructor
  bridge: add json support for bridge fdb show
  bridge: add json schema for bridge fdb show

Roopa Prabhu (2):
  bridge: add json support for bridge vlan show
  bridge: update man page

v2 - change vlan flags to an array as suggested by toshiaki

v3 - no change. resubmitting as requested by stephen

v4 - removed json type from constructor as recommended by stephen

 bridge/br_common.h|   1 +
 bridge/bridge.c   |   5 +-
 bridge/fdb.c  | 210 +-
 bridge/vlan.c | 109 +++---
 lib/json_writer.c |   6 +-
 man/man8/bridge.8 |   9 +-
 misc/ifstat.c |   7 ++
 misc/nstat.c  |   6 ++
 schema/bridge_fdb_schema.json |  62 +
 9 files changed, 352 insertions(+), 63 deletions(-)
 create mode 100644 schema/bridge_fdb_schema.json

-- 
1.9.1

[PATCH iproute2 net-next v4 2/5] bridge: add json support for bridge vlan show

2016-06-22 Thread Roopa Prabhu

From: Roopa Prabhu 

$bridge -c vlan show
portvlan ids
swp1 1 PVID Egress Untagged
 10-13

swp2 1 PVID Egress Untagged
 10-13

br0  1 PVID Egress Untagged

$bridge  -json vlan show
{
"swp1": [{
"vlan": 1,
"flags": ["PVID","Egress Untagged"
]
},{
"vlan": 10
},{
"vlan": 11
},{
"vlan": 12
},{
"vlan": 13
}
],
"swp2": [{
"vlan": 1,
"flags": ["PVID","Egress Untagged"
]
},{
"vlan": 10
},{
"vlan": 11
},{
"vlan": 12
},{
"vlan": 13
}
],
"br0": [{
"vlan": 1,
"flags": ["PVID","Egress Untagged"
]
}
]
}

$bridge -c -json vlan show
{
"swp1": [{
"vlan": 1,
"flags": ["PVID","Egress Untagged"
]
},{
"vlan": 10,
"vlanEnd": 13
}
],
"swp2": [{
"vlan": 1,
"flags": ["PVID","Egress Untagged"
]
},{
"vlan": 10,
"vlanEnd": 13
}
],
"br0": [{
"vlan": 1,
"flags": ["PVID","Egress Untagged"
]
}
]
}

Signed-off-by: Roopa Prabhu 
---
 bridge/br_common.h |   1 +
 bridge/bridge.c|   5 ++-
 bridge/vlan.c  | 109 ++---
 3 files changed, 100 insertions(+), 15 deletions(-)

diff --git a/bridge/br_common.h b/bridge/br_common.h
index 5ea45c9..c649e7d 100644
--- a/bridge/br_common.h
+++ b/bridge/br_common.h
@@ -23,4 +23,5 @@ extern int show_stats;
 extern int show_details;
 extern int timestamp;
 extern int compress_vlans;
+extern int json_output;
 extern struct rtnl_handle rth;
diff --git a/bridge/bridge.c b/bridge/bridge.c
index 72f153f..5ff038d 100644
--- a/bridge/bridge.c
+++ b/bridge/bridge.c
@@ -23,6 +23,7 @@ int oneline;
 int show_stats;
 int show_details;
 int compress_vlans;
+int json_output;
 int timestamp;
 char *batch_file;
 int force;
@@ -38,7 +39,7 @@ static void usage(void)
 "where OBJECT := { link | fdb | mdb | vlan | monitor }\n"
 "  OPTIONS := { -V[ersion] | -s[tatistics] | -d[etails] |\n"
 "   -o[neline] | -t[imestamp] | -n[etns] name |\n"
-"   -c[ompressvlans] }\n");
+"   -c[ompressvlans] -j{son} }\n");
exit(-1);
 }
 
@@ -173,6 +174,8 @@ main(int argc, char **argv)
++compress_vlans;
} else if (matches(opt, "-force") == 0) {
++force;
+   } else if (matches(opt, "-json") == 0) {
+   ++json_output;
} else if (matches(opt, "-batch") == 0) {
argc--;
argv++;
diff --git a/bridge/vlan.c b/bridge/vlan.c
index 717025a..ba4dfbc 100644
--- a/bridge/vlan.c
+++ b/bridge/vlan.c
@@ -7,6 +7,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include "libnetlink.h"
@@ -15,6 +16,8 @@
 
 static unsigned int filter_index, filter_vlan;
 
+json_writer_t *jw_global = NULL;
+
 static void usage(void)
 {
fprintf(stderr, "Usage: bridge vlan { add | del } vid VLAN_ID dev DEV [ 
pvid] [ untagged ]\n");
@@ -158,6 +161,28 @@ static int filter_vlan_check(struct bridge_vlan_info 
*vinfo)
return 1;
 }
 
+static void print_vlan_port(FILE *fp, int ifi_index)
+{
+   if (jw_global) {
+   jsonw_pretty(jw_global, 1);
+   jsonw_name(jw_global,
+  ll_index_to_name(ifi_index));
+   jsonw_start_array(jw_global);
+   } else {
+   fprintf(fp, "%s",
+   ll_index_to_name(ifi_index));
+   }
+}
+
+static void start_json_vlan_flags_array(bool *vlan_flags)
+{
+   if (*vlan_flags)
+   return;
+   jsonw_name(jw_global, "flags");
+   jsonw_start_array(jw_global);
+   *vlan_flags = true;
+}
+
 static int print_vlan(const struct sockaddr_nl *who,
  struct nlmsghdr *n,
  void *arg)
@@ -166,6 +191,8 @@ static int print_vlan(const struct sockaddr_nl *who,
struct ifinfomsg *ifm = NLMSG_DATA(n);
int len = n->nlmsg_len;
struct rtattr *tb[IFLA_MAX+1];
+   bool vlan_flags;
+   char flags[80]

[PATCH iproute2 net-next v4 1/5] json_writer: Removed automatic json-object type from the constructor

2016-06-22 Thread Roopa Prabhu

From: Anuradha Karuppiah 

Top level can be any json type and can be created using
jsonw_start_object/jsonw_end_object etc.

Signed-off-by: Anuradha Karuppiah 
---
 lib/json_writer.c | 8 
 misc/ifstat.c | 7 +++
 misc/nstat.c  | 6 ++
 3 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/lib/json_writer.c b/lib/json_writer.c
index 2af16e1..9fc05e9 100644
--- a/lib/json_writer.c
+++ b/lib/json_writer.c
@@ -33,7 +33,7 @@ struct json_writer {
 static void jsonw_indent(json_writer_t *self)
 {
unsigned i;
-   for (i = 0; i <= self->depth; ++i)
+   for (i = 0; i < self->depth; ++i)
fputs("", self->out);
 }
 
@@ -102,7 +102,6 @@ json_writer_t *jsonw_new(FILE *f)
self->depth = 0;
self->pretty = false;
self->sep = '\0';
-   putc('{', self->out);
}
return self;
 }
@@ -113,8 +112,7 @@ void jsonw_destroy(json_writer_t **self_p)
json_writer_t *self = *self_p;
 
assert(self->depth == 0);
-   jsonw_eol(self);
-   fputs("}\n", self->out);
+   fputs("\n", self->out);
fflush(self->out);
free(self);
*self_p = NULL;
@@ -269,6 +267,7 @@ int main(int argc, char **argv)
 {
json_writer_t *wr = jsonw_new(stdout);
 
+   jsonw_start_object(wr);
jsonw_pretty(wr, true);
jsonw_name(wr, "Vyatta");
jsonw_start_object(wr);
@@ -305,6 +304,7 @@ int main(int argc, char **argv)
 
jsonw_end_object(wr);
 
+   jsonw_end_object(wr);
jsonw_destroy(&wr);
return 0;
 }
diff --git a/misc/ifstat.c b/misc/ifstat.c
index abbb4e7..d551973 100644
--- a/misc/ifstat.c
+++ b/misc/ifstat.c
@@ -245,6 +245,7 @@ static void dump_raw_db(FILE *fp, int to_hist)
 
h = hist_db;
if (jw) {
+   jsonw_start_object(jw);
jsonw_pretty(jw, pretty);
jsonw_name(jw, info_source);
jsonw_start_object(jw);
@@ -288,6 +289,8 @@ static void dump_raw_db(FILE *fp, int to_hist)
}
if (jw) {
jsonw_end_object(jw);
+
+   jsonw_end_object(jw);
jsonw_destroy(&jw);
}
 }
@@ -451,6 +454,7 @@ static void dump_kern_db(FILE *fp)
struct ifstat_ent *n;
 
if (jw) {
+   jsonw_start_object(jw);
jsonw_pretty(jw, pretty);
jsonw_name(jw, info_source);
jsonw_start_object(jw);
@@ -477,6 +481,7 @@ static void dump_incr_db(FILE *fp)
 
h = hist_db;
if (jw) {
+   jsonw_start_object(jw);
jsonw_pretty(jw, pretty);
jsonw_name(jw, info_source);
jsonw_start_object(jw);
@@ -509,6 +514,8 @@ static void dump_incr_db(FILE *fp)
 
if (jw) {
jsonw_end_object(jw);
+
+   jsonw_end_object(jw);
jsonw_destroy(&jw);
}
 }
diff --git a/misc/nstat.c b/misc/nstat.c
index a9e0f20..8bd3a1a 100644
--- a/misc/nstat.c
+++ b/misc/nstat.c
@@ -284,6 +284,7 @@ static void dump_kern_db(FILE *fp, int to_hist)
 
h = hist_db;
if (jw) {
+   jsonw_start_object(jw);
jsonw_pretty(jw, pretty);
jsonw_name(jw, info_source);
jsonw_start_object(jw);
@@ -317,6 +318,8 @@ static void dump_kern_db(FILE *fp, int to_hist)
 
if (jw) {
jsonw_end_object(jw);
+
+   jsonw_end_object(jw);
jsonw_destroy(&jw);
}
 }
@@ -328,6 +331,7 @@ static void dump_incr_db(FILE *fp)
 
h = hist_db;
if (jw) {
+   jsonw_start_object(jw);
jsonw_pretty(jw, pretty);
jsonw_name(jw, info_source);
jsonw_start_object(jw);
@@ -364,6 +368,8 @@ static void dump_incr_db(FILE *fp)
 
if (jw) {
jsonw_end_object(jw);
+
+   jsonw_end_object(jw);
jsonw_destroy(&jw);
}
 }
-- 
1.9.1

[PATCH iproute2 net-next v4 4/5] bridge: add json schema for bridge fdb show

2016-06-22 Thread Roopa Prabhu

From: Anuradha Karuppiah 

Storing the schema file for the json format will be useful for doc
purposes as optional paramaters are typically suppressed in the json
sample outputs.

Signed-off-by: Anuradha Karuppiah 
---
 schema/bridge_fdb_schema.json | 62 +++
 1 file changed, 62 insertions(+)
 create mode 100644 schema/bridge_fdb_schema.json

diff --git a/schema/bridge_fdb_schema.json b/schema/bridge_fdb_schema.json
new file mode 100644
index 000..3e5be8d
--- /dev/null
+++ b/schema/bridge_fdb_schema.json
@@ -0,0 +1,62 @@
+{
+"$schema": "http://json-schema.org/draft-04/schema#";,
+"description": "bridge fdb show",
+"type": "array",
+"items": {
+"type": "object",
+"properties": {
+"dev": {
+"type": "string"
+},
+"dst": {
+"description" : "host name or ip address",
+"type": "string"
+},
+"flags": {
+"type": "array",
+"items": {
+"enum": ["self", "master", "router", "offload"]
+},
+"uniqueItems": true
+},
+"linkNetNsId": {
+"type": "integer"
+},
+"mac": {
+"type": "string"
+},
+"master": {
+"type": "string"
+},
+"opCode": {
+"description" : "used to indicate fdb entry del",
+"enum": ["deleted"]
+},
+"port": {
+"type": "integer"
+},
+"state": {
+"description" : "permanent, static, stale, state=#x",
+"type": "string"
+},
+"updated": {
+"type": "integer"
+},
+"used": {
+"type": "integer"
+},
+"viaIf": {
+"type": "string"
+},
+"viaIfIndex": {
+"type": "integer"
+},
+"vlan": {
+"type": "integer"
+},
+"vni": {
+"type": "integer"
+}
+}
+}
+}
-- 
1.9.1

[PATCH iproute2 net-next v4 5/5] bridge: update man page

2016-06-22 Thread Roopa Prabhu

From: Roopa Prabhu 

Signed-off-by: Roopa Prabhu 
---
 man/man8/bridge.8 | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/man/man8/bridge.8 b/man/man8/bridge.8
index 08e8a5b..abaee63 100644
--- a/man/man8/bridge.8
+++ b/man/man8/bridge.8
@@ -20,8 +20,9 @@ bridge \- show / manipulate bridge addresses and devices
 .IR OPTIONS " := { "
 \fB\-V\fR[\fIersion\fR] |
 \fB\-s\fR[\fItatistics\fR] |
-\fB\-n\fR[\fIetns\fR] name }
-\fB\-b\fR[\fIatch\fR] filename }
+\fB\-n\fR[\fIetns\fR] name |
+\fB\-b\fR[\fIatch\fR] filename |
+\fB\-j\fR[\fIson\fR] }
 
 .ti -8
 .BR "bridge link set"
@@ -153,6 +154,10 @@ Don't terminate bridge command on errors in batch mode.
 If there were any errors during execution of the commands, the application
 return code will be non zero.
 
+.TP
+.BR "\-json"
+Display results in JSON format. Currently available for vlan and fdb.
+
 .SH BRIDGE - COMMAND SYNTAX
 
 .SS
-- 
1.9.1

Re: [patch net-next v4 0/4] return offloaded stats as default and expose original sw stats

2016-06-22 Thread Roopa Prabhu

On Mon, Jun 20, 2016 at 5:28 AM, Jamal Hadi Salim  wrote:
> On 16-06-19 11:14 PM, Roopa Prabhu wrote:
>>
>> On Fri, Jun 17, 2016 at 10:12 AM, Florian Fainelli 
>> wrote:
>
>
>
>>
>> I have also mentioned this before, the default api must provide
>> accumulated (hw and sw) stats...,
>> because this is the api that the user queries on an interface.
>
>
> Sorry - I missed those discussions.
> What is current practise? Do people request for one via ip link
> stats and the other via ethtool?
> What do you guys do in your implementation?

for us the standard netlink api that returns netdev stats includes all
stats hw and sw.
When i say hw and sw, I mean some of the error counters can also include errors
counted by sw.

ethtool stats has always provided drivers/users with additional stats
that the hw or driver
can expose.

> Yes, it would be more accurate to provide aggregated stats but
> it may break backward compat if expectation is both are read
> separately today.

I don't think people see netdev stats as sw and ethtool as hw stats.
The latter just provides more granularity for debugging.
Thats the way i have looked at it forever.

> Maybe it makes sense to have a brand new TLV for these aggregated
> stats as Jiri was suggesting.That means two new TLVs not one.
> 1) TLV for aggregated stats - which cant be current one
> 2) TLV for h/w stats
>
> The existing stat implies s/ware only.
>

I don't think existing stat implies s/ware stats only. so, I think we
should be careful
about changing the meaning of existing stats.

logical devices like bridge stats have always been software only...but
with switchdev
the way we see these or implement these is to also include hardware stats when
they are hw offloaded. For us bridge vlan stats, vxlan stats and so on
will follow the
same model. You cannot introduce separate sw and hw stats for these.
All stats will have to follow a consistent model.

Thanks,
Roopa

[PATCH iproute2 net-next v4 3/5] bridge: add json support for bridge fdb show

2016-06-22 Thread Roopa Prabhu

From: Anuradha Karuppiah 

Sample output:
$bridge -j fdb show
[{
"mac": "44:38:39:00:69:88",
"dev": "swp2s0",
"vlan": 2,
"master": "br0",
"state": "permanent"
},{
"mac": "00:02:00:00:00:01",
"dev": "swp2s0",
"vlan": 2,
"master": "br0"
},{
"mac": "00:02:00:00:00:02",
"dev": "swp2s1",
"vlan": 2,
"master": "br0"
},{
"mac": "44:38:39:00:69:89",
"dev": "swp2s1",
"master": "br0",
"state": "permanent"
},{
    "mac": "44:38:39:00:69:89",
"dev": "swp2s1",
"vlan": 2,
"master": "br0",
"state": "permanent"
},{
"mac": "44:38:39:00:69:88",
"dev": "br0",
"master": "br0",
"state": "permanent"
}
]

Signed-off-by: Anuradha Karuppiah 
Signed-off-by: Roopa Prabhu 
---
 bridge/fdb.c | 210 +++
 1 file changed, 167 insertions(+), 43 deletions(-)

diff --git a/bridge/fdb.c b/bridge/fdb.c
index be849f9..3d1ef6c 100644
--- a/bridge/fdb.c
+++ b/bridge/fdb.c
@@ -21,6 +21,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #include "libnetlink.h"
 #include "br_common.h"
@@ -29,6 +31,8 @@
 
 static unsigned int filter_index, filter_vlan;
 
+json_writer_t *jw_global;
+
 static void usage(void)
 {
fprintf(stderr, "Usage: bridge fdb { add | append | del | replace } 
ADDR dev DEV\n"
@@ -59,6 +63,15 @@ static const char *state_n2a(unsigned int s)
return buf;
 }
 
+static void start_json_fdb_flags_array(bool *fdb_flags)
+{
+   if (*fdb_flags)
+   return;
+   jsonw_name(jw_global, "flags");
+   jsonw_start_array(jw_global);
+   *fdb_flags = true;
+}
+
 int print_fdb(const struct sockaddr_nl *who, struct nlmsghdr *n, void *arg)
 {
FILE *fp = arg;
@@ -66,11 +79,12 @@ int print_fdb(const struct sockaddr_nl *who, struct 
nlmsghdr *n, void *arg)
int len = n->nlmsg_len;
struct rtattr *tb[NDA_MAX+1];
__u16 vid = 0;
+   bool fdb_flags = false;
+   const char *state_s;
 
if (n->nlmsg_type != RTM_NEWNEIGH && n->nlmsg_type != RTM_DELNEIGH) {
fprintf(stderr, "Not RTM_NEWNEIGH: %08x %08x %08x\n",
n->nlmsg_len, n->nlmsg_type, n->nlmsg_flags);
-
return 0;
}
 
@@ -86,6 +100,11 @@ int print_fdb(const struct sockaddr_nl *who, struct 
nlmsghdr *n, void *arg)
if (filter_index && filter_index != r->ndm_ifindex)
return 0;
 
+   if (jw_global) {
+   jsonw_pretty(jw_global, 1);
+   jsonw_start_object(jw_global);
+   }
+
parse_rtattr(tb, NDA_MAX, NDA_RTA(r),
 n->nlmsg_len - NLMSG_LENGTH(sizeof(*r)));
 
@@ -95,40 +114,75 @@ int print_fdb(const struct sockaddr_nl *who, struct 
nlmsghdr *n, void *arg)
if (filter_vlan && filter_vlan != vid)
return 0;
 
-   if (n->nlmsg_type == RTM_DELNEIGH)
-   fprintf(fp, "Deleted ");
+   if (n->nlmsg_type == RTM_DELNEIGH) {
+   if (jw_global)
+   jsonw_string_field(jw_global, "opCode", "deleted");
+   else
+   fprintf(fp, "Deleted ");
+   }
 
if (tb[NDA_LLADDR]) {
SPRINT_BUF(b1);
-   fprintf(fp, "%s ",
-   ll_addr_n2a(RTA_DATA(tb[NDA_LLADDR]),
-   RTA_PAYLOAD(tb[NDA_LLADDR]),
-   ll_index_to_type(r->ndm_ifindex),
-   b1, sizeof(b1)));
+   ll_addr_n2a(RTA_DATA(tb[NDA_LLADDR]),
+   RTA_PAYLOAD(tb[NDA_LLADDR]),
+   ll_index_to_type(r->ndm_ifindex),
+   b1, sizeof(b1));
+   if (jw_global)
+   jsonw_string_field(jw_global, "mac", b1);
+   else
+   fprintf(fp, "%s ", b1);
}
 
-   if (!filter_index && r->ndm_ifindex)
-   fprintf(fp, "dev %s ", ll_index_to_name(r->ndm_ifindex));
+   if (!filter_index && r->ndm_ifindex) {
+   if (jw_global)
+   jsonw_string_field(jw_global,

Re: [PATCH iproute2 net-next v4 0/5] bridge: json support for fdb and vlan show

2016-06-22 Thread Roopa Prabhu

On Wed, Jun 22, 2016 at 7:53 AM, Jiri Pirko  wrote:
> Wed, Jun 22, 2016 at 03:45:50PM CEST, ro...@cumulusnetworks.com wrote:
>>From: Roopa Prabhu 
>>
>>This patch series adds json support for a few bridge show commands.
>>We plan to follow up with json support for additional commands soon.
>
> I'm just curious, what is you use case for this? Apps can use rtnetlink
> socket directly.

most important use case is for automation/orchestration tools.
They use existing linux tools to query and configure. Iproute2 output
is not machine
readable today ...and json is industry standard.

Re: [PATCH iproute2 net-next v4 0/5] bridge: json support for fdb and vlan show

2016-06-22 Thread Roopa Prabhu

On Wed, Jun 22, 2016 at 11:10 AM, Stephen Hemminger
 wrote:
> On Wed, 22 Jun 2016 16:53:44 +0200
> Jiri Pirko  wrote:
>
>> Wed, Jun 22, 2016 at 03:45:50PM CEST, ro...@cumulusnetworks.com wrote:
>> >From: Roopa Prabhu 
>> >
>> >This patch series adds json support for a few bridge show commands.
>> >We plan to follow up with json support for additional commands soon.
>>
>> I'm just curious, what is you use case for this? Apps can use rtnetlink
>> socket directly.
>
> Try using netlink in perl or python, it is quite difficult.

yep, ++

Re: [patch net-next v5 0/4] return offloaded stats as default and expose original sw stats

2016-06-22 Thread Roopa Prabhu

On Tue, Jun 21, 2016 at 8:15 AM, Jiri Pirko  wrote:
> From: Jiri Pirko 
>
> The problem we try to handle is about offloaded forwarded packets
> which are not seen by kernel. Let me try to draw it:
>
> port1   port2 (HW stats are counted here)
>   \  /
>\/
> \  /
>  --(A) ASIC --(B)--
> |
>(C)
> |
>CPU (SW stats are counted here)
>
>
> Now we have couple of flows for TX and RX (direction does not matter here):
>
> 1) port1->A->ASIC->C->CPU
>
>For this flow, HW and SW stats are equal.
>
> 2) port1->A->ASIC->C->CPU->C->ASIC->B->port2
>
>For this flow, HW and SW stats are equal.
>
> 3) port1->A->ASIC->B->port2
>
>For this flow, SW stats are 0.
>
> The purpose of this patchset is to provide facility for user to
> find out the difference between flows 1+2 and 3. In other words, user
> will be able to see the statistics for the slow-path (through kernel).
>
> Also note that HW stats are what someone calls "accumulated" stats.
> Every packet counted by SW is also counted by HW. Not the other way around.
>
> As a default the accumulated stats (HW) will be exposed to user
> so the userspace apps can react properly.
>
>

curious, how do you plan to handle virtual device counters like vlan
and vxlan stats ?.
we can't separate CPU and HW stats there. In some cases (or ASICs) HW
counters do
not include CPU generated packetsyou will have to add CPU
generated pkt counters to the
hw counters for such virtual device stats.

example: In the switchdev model, for bridge vlan stats, when user
queries bridge vlan stats,
you will have to add the hw stats to the bridge driver vlan stats and
return it to the user .

Having a consistent model for all kinds of stats will help.

Re: [PATCH iproute2 net-next v4 0/5] bridge: json support for fdb and vlan show

2016-06-23 Thread Roopa Prabhu

On Wed, Jun 22, 2016 at 1:00 PM, Jiri Pirko  wrote:
> Wed, Jun 22, 2016 at 08:10:47PM CEST, step...@networkplumber.org wrote:
>>On Wed, 22 Jun 2016 16:53:44 +0200
>>Jiri Pirko  wrote:
>>
>>> Wed, Jun 22, 2016 at 03:45:50PM CEST, ro...@cumulusnetworks.com wrote:
>>> >From: Roopa Prabhu 
>>> >
>>> >This patch series adds json support for a few bridge show commands.
>>> >We plan to follow up with json support for additional commands soon.
>>>
>>> I'm just curious, what is you use case for this? Apps can use rtnetlink
>>> socket directly.
>>
>>Try using netlink in perl or python, it is quite difficult.
>
> pyroute2? Quite easy...

none of the implementations out there are complete nor can compete
with iproute2.
iproute2 is maintained by netdev community and always is up-todate
with the latest
networking api.

Nothing against pyroute2 but we wrote our own for other reasons and we
carry additional burden of maintaining it and keeping it up-todate for
every networking api..
that gets added to iproute2 (and the implementation of netlink
is often very easy in C).

Also, for external automation and orchestration tools (to whom this
patch-set is addressed),
there is no reason for them to write and maintain their own tools
using netlink when they
can use iproute2 directly to create a link or query its properties.

Re: [patch net-next v5 0/4] return offloaded stats as default and expose original sw stats

2016-06-23 Thread Roopa Prabhu

On 6/22/16, 10:40 PM, Jiri Pirko wrote:
> Wed, Jun 22, 2016 at 09:32:25PM CEST, ro...@cumulusnetworks.com wrote:
>> On Tue, Jun 21, 2016 at 8:15 AM, Jiri Pirko  wrote:
>>> From: Jiri Pirko 
>>>
>>> The problem we try to handle is about offloaded forwarded packets
>>> which are not seen by kernel. Let me try to draw it:
>>>
>>> port1   port2 (HW stats are counted here)
>>>   \  /
>>>\/
>>> \  /
>>>  --(A) ASIC --(B)--
>>> |
>>>(C)
>>> |
>>>CPU (SW stats are counted here)
>>>
>>>
>>> Now we have couple of flows for TX and RX (direction does not matter here):
>>>
>>> 1) port1->A->ASIC->C->CPU
>>>
>>>For this flow, HW and SW stats are equal.
>>>
>>> 2) port1->A->ASIC->C->CPU->C->ASIC->B->port2
>>>
>>>For this flow, HW and SW stats are equal.
>>>
>>> 3) port1->A->ASIC->B->port2
>>>
>>>For this flow, SW stats are 0.
>>>
>>> The purpose of this patchset is to provide facility for user to
>>> find out the difference between flows 1+2 and 3. In other words, user
>>> will be able to see the statistics for the slow-path (through kernel).
>>>
>>> Also note that HW stats are what someone calls "accumulated" stats.
>>> Every packet counted by SW is also counted by HW. Not the other way around.
>>>
>>> As a default the accumulated stats (HW) will be exposed to user
>>> so the userspace apps can react properly.
>>>
>>>
>> curious, how do you plan to handle virtual device counters like vlan
>> and vxlan stats ?.
> Yes, that is another problem (1). We have to push stats up to this devices
> most probably. But that problem is orthogonal to this. To the user, you
> will still need 2 sets of stats and HW stats being default. So this
> patchset infra is going to be used as well.
hmm...But, i don't think we should start adding different tlv's hw and sw for
every stats variant we add.
>
>
>> we can't separate CPU and HW stats there. In some cases (or ASICs) HW
>> counters do
>> not include CPU generated packetsyou will have to add CPU
>> generated pkt counters to the
>> hw counters for such virtual device stats.
> Can you please provide and example how that could happen?

example is the bridge vlan stats I mention below. These are usually counted
by attaching hw virtual counter resources. And CPU generated packets
in some cases maybe setup to bypass the ASIC pipeline because the CPU
has already made the required decisions. So, they may not be counted by
by such hw virtual counters.

>
>
>> example: In the switchdev model, for bridge vlan stats, when user
>> queries bridge vlan stats,
>> you will have to add the hw stats to the bridge driver vlan stats and
>> return it to the user .
> Yep, that is (1).

unless i misunderstood, this does not look like (1). In (1) you say hw stats
 already reflect sw stats. But in this case, hw counter does not include sw 
stats
for CPU generated packets.
>
>
>> Having a consistent model for all kinds of stats will help.

Re: [patch net-next v5 0/4] return offloaded stats as default and expose original sw stats

2016-06-25 Thread Roopa Prabhu

On Thu, Jun 23, 2016 at 8:40 AM, Jiri Pirko  wrote:
> Thu, Jun 23, 2016 at 05:11:26PM CEST, anurad...@cumulusnetworks.com wrote:
>> we can't separate CPU and HW stats there. In some cases (or ASICs) HW
>> counters do
>> not include CPU generated packetsyou will have to add CPU
>> generated pkt counters to the
>> hw counters for such virtual device stats.
> Can you please provide and example how that could happen?

example is the bridge vlan stats I mention below. These are usually counted
by attaching hw virtual counter resources. And CPU generated packets
in some cases maybe setup to bypass the ASIC pipeline because the CPU
has already made the required decisions. So, they may not be counted by
by such hw virtual counters.
>>>
>>> Bypass ASIC? How do the packets get on the wire?
>>>
>>
>>Bypass the "forwarding pipeline" in the ASIC that is. Obviously the
>>ASIC ships the CPU generated packet out of the switch/front-panel
>>port. Continuing Roopa's example of vlan netdev stats To get the
>>HW stats counters are typically tied to the ingress and egress vlan hw
>>entries. All the incoming packets are subject to the ingress vlan
>>lookup irrespective of whether they get punted to the CPU or whether
>>they are forwarded to another front panel port. In that case the
>>ingress HW stats does represent all packets. However for CPU
>>originated packets egress vlan lookups are bypassed in the ASIC (this
>>is common forwarding option in most ASICs) and the packet shipped as
>>is out of front-panel port specified by the CPU. Which means these
>>packets will NOT be counted against the egress VLAN HW counter; hence
>>the need for summation.
>
> Driver will know about this, and will provide the stats accordignly to
> the core. Who else than driver should resolve this.
>

The point was/is that there should be only two categories:
1) the base default stats: can contain 'only sw', 'only hw' or 'a
summation of hw and sw' in some cases.
The user does not care about the breakdown.

2) everything else falls into the second category: driver provided
breakdown of stats for easier debugging.
This today is ethtool stats and we can have an equivalent nested
attribute for this in the new stats api.
Lets call it IFLA_STATS_LINK_DRIVER or you pick a name. Lets make it
nested and extensible (like ethtool is) and
driver can expose any kind of stats there.
ie lets move the stats you are proposing to this category of stats.
instead of introducing a third category 'SW stats'.

Re: [patch net-next v5 0/4] return offloaded stats as default and expose original sw stats

2016-06-26 Thread Roopa Prabhu

On 6/26/16, 2:33 AM, Jiri Pirko wrote:
> Sat, Jun 25, 2016 at 05:50:59PM CEST, ro...@cumulusnetworks.com wrote:
>> On Thu, Jun 23, 2016 at 8:40 AM, Jiri Pirko  wrote:
>>> Thu, Jun 23, 2016 at 05:11:26PM CEST, anurad...@cumulusnetworks.com wrote:
 we can't separate CPU and HW stats there. In some cases (or ASICs) HW
 counters do
 not include CPU generated packetsyou will have to add CPU
 generated pkt counters to the
 hw counters for such virtual device stats.
>>> Can you please provide and example how that could happen?
>> example is the bridge vlan stats I mention below. These are usually 
>> counted
>> by attaching hw virtual counter resources. And CPU generated packets
>> in some cases maybe setup to bypass the ASIC pipeline because the CPU
>> has already made the required decisions. So, they may not be counted by
>> by such hw virtual counters.
> Bypass ASIC? How do the packets get on the wire?
>
 Bypass the "forwarding pipeline" in the ASIC that is. Obviously the
 ASIC ships the CPU generated packet out of the switch/front-panel
 port. Continuing Roopa's example of vlan netdev stats To get the
 HW stats counters are typically tied to the ingress and egress vlan hw
 entries. All the incoming packets are subject to the ingress vlan
 lookup irrespective of whether they get punted to the CPU or whether
 they are forwarded to another front panel port. In that case the
 ingress HW stats does represent all packets. However for CPU
 originated packets egress vlan lookups are bypassed in the ASIC (this
 is common forwarding option in most ASICs) and the packet shipped as
 is out of front-panel port specified by the CPU. Which means these
 packets will NOT be counted against the egress VLAN HW counter; hence
 the need for summation.
>>> Driver will know about this, and will provide the stats accordignly to
>>> the core. Who else than driver should resolve this.
>>>
>> The point was/is that there should be only two categories:
>> 1) the base default stats: can contain 'only sw', 'only hw' or 'a
>> summation of hw and sw' in some cases.
>> The user does not care about the breakdown.
>>
>> 2) everything else falls into the second category: driver provided
>> breakdown of stats for easier debugging.
>> This today is ethtool stats and we can have an equivalent nested
>> attribute for this in the new stats api.
>> Lets call it IFLA_STATS_LINK_DRIVER or you pick a name. Lets make it
>> nested and extensible (like ethtool is) and
>> driver can expose any kind of stats there.
>> ie lets move the stats you are proposing to this category of stats.
>> instead of introducing a third category 'SW stats'.
> What you are proposing is essentially what our patchset does. We expose
> 2 sets of stats. hw and pure sw. hw includes all, driver will take
> care of it cause he knows what is going on in hw.
the splitting into hw and sw is causing some confusion with respect
to existing stats and will be confusing for future stats. And i am not sure how 
many
users would prefer the split this way.
So, instead of doing the split, i think we should at this time introduce
driver specific stats (like ethtool) as a nested netlink attribute.
>
> Btw mirroring random string stats into Netlink is not a good idea IMO.
Any reason you say that ?. I am thinking it would be much easier with netlink.
keeping it simple, it is a nested attribute with stat-name and value pair.

struct stat {
char stats_name[STATS_NAME_LEN];/* STATS_NAME_LEN = 32 */
__u64 stat;
};

IFLA_STATS_LINK_DRIVER is a nested attribute with multiple 
IFLA_STATS_LINK_DRIVER_ENTRY of type 'struct stat'.

(If using a struct is a concern: each IFLA_STATS_LINK_DRIVER_ENTRY can be a 
nested attribute
of stat-name/value pair. Though it does not seem very necessary in this case).

PS: not fond of the name IFLA_STATS_LINK_DRIVER...any other suggestions are 
welcome.

Re: [patch net-next v5 0/4] return offloaded stats as default and expose original sw stats

2016-06-26 Thread Roopa Prabhu

On Sun, Jun 26, 2016 at 11:15 AM, Jiri Pirko  wrote:
> Sun, Jun 26, 2016 at 08:06:40PM CEST, ro...@cumulusnetworks.com wrote:
>>On 6/26/16, 2:33 AM, Jiri Pirko wrote:
>>> Sat, Jun 25, 2016 at 05:50:59PM CEST, ro...@cumulusnetworks.com wrote:
 On Thu, Jun 23, 2016 at 8:40 AM, Jiri Pirko  wrote:
> Thu, Jun 23, 2016 at 05:11:26PM CEST, anurad...@cumulusnetworks.com wrote:
>> we can't separate CPU and HW stats there. In some cases (or ASICs) HW
>> counters do
>> not include CPU generated packetsyou will have to add CPU
>> generated pkt counters to the
>> hw counters for such virtual device stats.
> Can you please provide and example how that could happen?
 example is the bridge vlan stats I mention below. These are usually 
 counted
 by attaching hw virtual counter resources. And CPU generated packets
 in some cases maybe setup to bypass the ASIC pipeline because the CPU
 has already made the required decisions. So, they may not be counted by
 by such hw virtual counters.
>>> Bypass ASIC? How do the packets get on the wire?
>>>
>> Bypass the "forwarding pipeline" in the ASIC that is. Obviously the
>> ASIC ships the CPU generated packet out of the switch/front-panel
>> port. Continuing Roopa's example of vlan netdev stats To get the
>> HW stats counters are typically tied to the ingress and egress vlan hw
>> entries. All the incoming packets are subject to the ingress vlan
>> lookup irrespective of whether they get punted to the CPU or whether
>> they are forwarded to another front panel port. In that case the
>> ingress HW stats does represent all packets. However for CPU
>> originated packets egress vlan lookups are bypassed in the ASIC (this
>> is common forwarding option in most ASICs) and the packet shipped as
>> is out of front-panel port specified by the CPU. Which means these
>> packets will NOT be counted against the egress VLAN HW counter; hence
>> the need for summation.
> Driver will know about this, and will provide the stats accordignly to
> the core. Who else than driver should resolve this.
>
 The point was/is that there should be only two categories:
 1) the base default stats: can contain 'only sw', 'only hw' or 'a
 summation of hw and sw' in some cases.
 The user does not care about the breakdown.

 2) everything else falls into the second category: driver provided
 breakdown of stats for easier debugging.
 This today is ethtool stats and we can have an equivalent nested
 attribute for this in the new stats api.
 Lets call it IFLA_STATS_LINK_DRIVER or you pick a name. Lets make it
 nested and extensible (like ethtool is) and
 driver can expose any kind of stats there.
 ie lets move the stats you are proposing to this category of stats.
 instead of introducing a third category 'SW stats'.
>>> What you are proposing is essentially what our patchset does. We expose
>>> 2 sets of stats. hw and pure sw. hw includes all, driver will take
>>> care of it cause he knows what is going on in hw.
>>the splitting into hw and sw is causing some confusion with respect
>
> I still don't get why you are talking about split :( I see no split.
>
>
>>to existing stats and will be confusing for future stats. And i am not sure 
>>how many
>>users would prefer the split this way.
>>So, instead of doing the split, i think we should at this time introduce
>>driver specific stats (like ethtool) as a nested netlink attribute.
>
> The default netlink stats should be hw (or accumulated as you call it).
> The reason is to avoid confusion for existing apps. Another attribute is
> possible for more break-out stats - that is what this patchset is doing.
>
> Ethtool stats are wrong and useless for apps as they are driver-specific.

apps only care about overall stats. thats the aggregate stats
provided by the default netlink netdev api to the user...which already exists.

they don't care about your new breakdown either.

breakdown of stats are used for debugging and thats what ethtool stats provide.



>
>
>>>
>>> Btw mirroring random string stats into Netlink is not a good idea IMO.
>>Any reason you say that ?. I am thinking it would be much easier with netlink.
>>keeping it simple, it is a nested attribute with stat-name and value pair.
>>
>>struct stat {
>>char stats_name[STATS_NAME_LEN];/* STATS_NAME_LEN = 32 */
>>__u64 stat;
>>};
>
> No please. This should be well defined generic group of stats.
> Driver-specific names/stats stats are wrong.
>

they are meant for debugging. are you saying the new stats api should
not contain 'ethtool like' stats ?

ethtool stats are very valuable today. They are extensible.
They cannot be made generic and they are specific to a hardware or use case.

We use it for our switch port stats too. Base aggregate stats summed
up and provided as

Re: [patch net-next v5 0/4] return offloaded stats as default and expose original sw stats

2016-06-27 Thread Roopa Prabhu

[resending ...my previous reply sent some non-text content]

On Sun, Jun 26, 2016 at 11:51 PM, Jiri Pirko  wrote:
>
> Mon, Jun 27, 2016 at 04:53:53AM CEST, ro...@cumulusnetworks.com wrote:
> >On Sun, Jun 26, 2016 at 11:15 AM, Jiri Pirko  wrote:
> >> Sun, Jun 26, 2016 at 08:06:40PM CEST, ro...@cumulusnetworks.com wrote:
> >>>On 6/26/16, 2:33 AM, Jiri Pirko wrote:
>  Sat, Jun 25, 2016 at 05:50:59PM CEST, ro...@cumulusnetworks.com wrote:
> > On Thu, Jun 23, 2016 at 8:40 AM, Jiri Pirko  wrote:
> >> Thu, Jun 23, 2016 at 05:11:26PM CEST, anurad...@cumulusnetworks.com 
> >> wrote:
> >>> we can't separate CPU and HW stats there. In some cases (or 
> >>> ASICs) HW
> >>> counters do
> >>> not include CPU generated packetsyou will have to add CPU
> >>> generated pkt counters to the
> >>> hw counters for such virtual device stats.
> >> Can you please provide and example how that could happen?
> > example is the bridge vlan stats I mention below. These are usually 
> > counted
> > by attaching hw virtual counter resources. And CPU generated packets
> > in some cases maybe setup to bypass the ASIC pipeline because the 
> > CPU
> > has already made the required decisions. So, they may not be 
> > counted by
> > by such hw virtual counters.
>  Bypass ASIC? How do the packets get on the wire?
> 
> >>> Bypass the "forwarding pipeline" in the ASIC that is. Obviously the
> >>> ASIC ships the CPU generated packet out of the switch/front-panel
> >>> port. Continuing Roopa's example of vlan netdev stats To get the
> >>> HW stats counters are typically tied to the ingress and egress vlan hw
> >>> entries. All the incoming packets are subject to the ingress vlan
> >>> lookup irrespective of whether they get punted to the CPU or whether
> >>> they are forwarded to another front panel port. In that case the
> >>> ingress HW stats does represent all packets. However for CPU
> >>> originated packets egress vlan lookups are bypassed in the ASIC (this
> >>> is common forwarding option in most ASICs) and the packet shipped as
> >>> is out of front-panel port specified by the CPU. Which means these
> >>> packets will NOT be counted against the egress VLAN HW counter; hence
> >>> the need for summation.
> >> Driver will know about this, and will provide the stats accordignly to
> >> the core. Who else than driver should resolve this.
> >>
> > The point was/is that there should be only two categories:
> > 1) the base default stats: can contain 'only sw', 'only hw' or 'a
> > summation of hw and sw' in some cases.
> > The user does not care about the breakdown.
> >
> > 2) everything else falls into the second category: driver provided
> > breakdown of stats for easier debugging.
> > This today is ethtool stats and we can have an equivalent nested
> > attribute for this in the new stats api.
> > Lets call it IFLA_STATS_LINK_DRIVER or you pick a name. Lets make it
> > nested and extensible (like ethtool is) and
> > driver can expose any kind of stats there.
> > ie lets move the stats you are proposing to this category of stats.
> > instead of introducing a third category 'SW stats'.
>  What you are proposing is essentially what our patchset does. We expose
>  2 sets of stats. hw and pure sw. hw includes all, driver will take
>  care of it cause he knows what is going on in hw.
> >>>the splitting into hw and sw is causing some confusion with respect
> >>
> >> I still don't get why you are talking about split :( I see no split.
> >>
> >>
> >>>to existing stats and will be confusing for future stats. And i am not 
> >>>sure how many
> >>>users would prefer the split this way.
> >>>So, instead of doing the split, i think we should at this time introduce
> >>>driver specific stats (like ethtool) as a nested netlink attribute.
> >>
> >> The default netlink stats should be hw (or accumulated as you call it).
> >> The reason is to avoid confusion for existing apps. Another attribute is
> >> possible for more break-out stats - that is what this patchset is doing.
> >>
> >> Ethtool stats are wrong and useless for apps as they are driver-specific.
> >
> >apps only care about overall stats. thats the aggregate stats
> >provided by the default netlink netdev api to the user...which already 
> >exists.
> >
> >they don't care about your new breakdown either.
>
> Agreed. That is what our patchset is doing.
>
>
> >
> >breakdown of stats are used for debugging and thats what ethtool stats 
> >provide.
> >
> >
> >
> >>
> >>
> 
>  Btw mirroring random string stats into Netlink is not a good idea IMO.
> >>>Any reason you say that ?. I am thinking it would be much easier with 
> >>>netlink.
> >>>keeping it simple, it is a nested attribute with stat-name and value pair

[PATCH iproute2] bridge: support for static fdb entries

2016-01-27 Thread Roopa Prabhu

From: Roopa Prabhu 

There is no intuitive option to add static fdb entries today.
'temp' seems to have a side effect of adding
'static' fdb entries. But the name and intent
of 'temp' does not say anything about it being static.

example:
bridge fdb add operates as follows:

$bridge fdb add 00:01:02:03:04:05 dev eth0 master
$bridge fdb add 00:01:02:03:04:06 dev eth0 master temp
$bridge fdb add 00:01:02:03:04:07 dev eth0 master local

$bridge fdb show
00:01:02:03:04:05 dev eth0 permanent
00:01:02:03:04:06 dev eth0 static
00:01:02:03:04:07 dev eth0 permanent
00:01:02:03:04:08 dev eth0 <<== dynamic, ageable learned mac

This patch adds a new bridge fdb type 'static' which
makes sure NUD_NOARP and NUD_REACHABLE is set for static
entries. This effectively is nothing but what 'temp'
does today. But the name 'temp' is misleading.

After the patch:
$bridge fdb add 00:01:02:03:04:06 dev eth0 master static

$bridge fdb show
00:01:02:03:04:06 dev eth0 static

'temp' could ideally be a dynamic mac that can age (ie just
NUD_REACHABLE). But, 'temp' sets 'NUD_NOARP' and 'NUD_REACHABLE'.
Too late to change 'temp' now. But, we are thinking of introduing a
'dynamic' keyword after this patch that only sets NUD_REACHABLE.

Signed-off-by: Wilson Kok 
Signed-off-by: Roopa Prabhu 
---
Will submit another patch to document bridge fdb options
once we agree on the behaviour and this patch is accepted.

 bridge/fdb.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/bridge/fdb.c b/bridge/fdb.c
index 4d10925..9bc6b94 100644
--- a/bridge/fdb.c
+++ b/bridge/fdb.c
@@ -33,7 +33,7 @@ static void usage(void)
 {
fprintf(stderr, "Usage: bridge fdb { add | append | del | replace } 
ADDR dev DEV\n"
"  [ self ] [ master ] [ use ] [ router ]\n"
-   "  [ local | temp ] [ dst IPADDR ] [ vlan 
VID ]\n"
+   "  [ local | temp | static ] [ dst IPADDR ] 
[ vlan VID ]\n"
"  [ port PORT] [ vni VNI ] [ via DEV ]\n");
fprintf(stderr, "   bridge fdb [ show [ br BRDEV ] [ brport DEV ] 
]\n");
exit(-1);
@@ -301,7 +301,8 @@ static int fdb_modify(int cmd, int flags, int argc, char 
**argv)
} else if (matches(*argv, "local") == 0||
   matches(*argv, "permanent") == 0) {
req.ndm.ndm_state |= NUD_PERMANENT;
-   } else if (matches(*argv, "temp") == 0) {
+   } else if (matches(*argv, "temp") == 0 ||
+  matches(*argv, "static") == 0) {
req.ndm.ndm_state |= NUD_REACHABLE;
} else if (matches(*argv, "vlan") == 0) {
if (vid >= 0)
-- 
1.9.1

[PATCH iproute2 v2] bridge: add batch command support

2015-10-11 Thread Roopa Prabhu

From: Wilson Kok  

This patch adds support to batch bridge commands.
Follows ip batch code.

Signed-off-by: Wilson Kok 
Signed-off-by: Roopa Prabhu 
Acked-by: Christophe Gouault 
---
v2 - change tab to space in usage as pointed out by Christophe Gouault

 bridge/bridge.c   | 59 +++
 man/man8/bridge.8 | 11 +++
 2 files changed, 70 insertions(+)

diff --git a/bridge/bridge.c b/bridge/bridge.c
index eaf09c8..72f153f 100644
--- a/bridge/bridge.c
+++ b/bridge/bridge.c
@@ -9,6 +9,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "SNAPSHOT.h"
 #include "utils.h"
@@ -23,6 +24,8 @@ int show_stats;
 int show_details;
 int compress_vlans;
 int timestamp;
+char *batch_file;
+int force;
 const char *_SL_;
 
 static void usage(void) __attribute__((noreturn));
@@ -31,6 +34,7 @@ static void usage(void)
 {
fprintf(stderr,
 "Usage: bridge [ OPTIONS ] OBJECT { COMMAND | help }\n"
+"   bridge [ -force ] -batch filename\n"
 "where OBJECT := { link | fdb | mdb | vlan | monitor }\n"
 "  OPTIONS := { -V[ersion] | -s[tatistics] | -d[etails] |\n"
 "   -o[neline] | -t[imestamp] | -n[etns] name |\n"
@@ -71,6 +75,50 @@ static int do_cmd(const char *argv0, int argc, char **argv)
return -1;
 }
 
+static int batch(const char *name)
+{
+   char *line = NULL;
+   size_t len = 0;
+   int ret = EXIT_SUCCESS;
+
+   if (name && strcmp(name, "-") != 0) {
+   if (freopen(name, "r", stdin) == NULL) {
+   fprintf(stderr,
+   "Cannot open file \"%s\" for reading: %s\n",
+   name, strerror(errno));
+   return EXIT_FAILURE;
+   }
+   }
+
+   if (rtnl_open(&rth, 0) < 0) {
+   fprintf(stderr, "Cannot open rtnetlink\n");
+   return EXIT_FAILURE;
+   }
+
+   cmdlineno = 0;
+   while (getcmdline(&line, &len, stdin) != -1) {
+   char *largv[100];
+   int largc;
+
+   largc = makeargs(line, largv, 100);
+   if (largc == 0)
+   continue;   /* blank line */
+
+   if (do_cmd(largv[0], largc, largv)) {
+   fprintf(stderr, "Command failed %s:%d\n",
+   name, cmdlineno);
+   ret = EXIT_FAILURE;
+   if (!force)
+   break;
+   }
+   }
+   if (line)
+   free(line);
+
+   rtnl_close(&rth);
+   return ret;
+}
+
 int
 main(int argc, char **argv)
 {
@@ -123,6 +171,14 @@ main(int argc, char **argv)
exit(-1);
} else if (matches(opt, "-compressvlans") == 0) {
++compress_vlans;
+   } else if (matches(opt, "-force") == 0) {
+   ++force;
+   } else if (matches(opt, "-batch") == 0) {
+   argc--;
+   argv++;
+   if (argc <= 1)
+   usage();
+   batch_file = argv[1];
} else {
fprintf(stderr,
"Option \"%s\" is unknown, try \"bridge 
help\".\n",
@@ -134,6 +190,9 @@ main(int argc, char **argv)
 
_SL_ = oneline ? "\\" : "\n";
 
+   if (batch_file)
+   return batch(batch_file);
+
if (rtnl_open(&rth, 0) < 0)
exit(1);
 
diff --git a/man/man8/bridge.8 b/man/man8/bridge.8
index 5347a56..d45c728 100644
--- a/man/man8/bridge.8
+++ b/man/man8/bridge.8
@@ -21,6 +21,7 @@ bridge \- show / manipulate bridge addresses and devices
 \fB\-V\fR[\fIersion\fR] |
 \fB\-s\fR[\fItatistics\fR] |
 \fB\-n\fR[\fIetns\fR] name }
+\fB\-b\fR[\fIatch\fR] filename }
 
 .ti -8
 .BR "bridge link set"
@@ -137,6 +138,16 @@ to
 .RI "-n[etns] " NETNS " [ " OPTIONS " ] " OBJECT " { " COMMAND " | "
 .BR help " }"
 
+.TP
+.BR "\-b", " \-batch " 
+Read commands from provided file or standard input and invoke them.
+First failure will cause termination of bridge command.
+
+.TP
+.BR "\-force"
+Don't terminate bridge command on errors in batch mode.
+If there were any errors during execution of the commands, the application
+return code will be non zero.
 
 .SH BRIDGE - COMMAND SYNTAX
 
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH iproute2] ip monitor neigh: Change 'delete' to 'Deleted' to be consistent with ip route

2015-10-15 Thread Roopa Prabhu

From: Roopa Prabhu 

It helps to grep for one string "Deleted" when monitoring all events.

Fixes: 6ea3ebafe077 ("iproute2: inform user when a neighbor is removed")
Signed-off-by: Roopa Prabhu 
---
I am not sure if it is too late for this change. But, sending this patch 
out because it only affects ip monitor output

 ip/ipneigh.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ip/ipneigh.c b/ip/ipneigh.c
index a9e23f4..ce57ede 100644
--- a/ip/ipneigh.c
+++ b/ip/ipneigh.c
@@ -256,7 +256,7 @@ int print_neigh(const struct sockaddr_nl *who, struct 
nlmsghdr *n, void *arg)
}
 
if (n->nlmsg_type == RTM_DELNEIGH)
-   fprintf(fp, "delete ");
+   fprintf(fp, "Deleted ");
else if (n->nlmsg_type == RTM_GETNEIGH)
fprintf(fp, "miss ");
if (tb[NDA_DST]) {
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH iproute2] bridge: add calls to fflush in fdb and mdb print functions

2015-10-15 Thread Roopa Prabhu

From: Wilson Kok 

This patch adds fflush in fdb and mdb print functions

Signed-off-by: Wilson Kok 
Signed-off-by: Roopa Prabhu 
---
 bridge/fdb.c | 2 ++
 bridge/mdb.c | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/bridge/fdb.c b/bridge/fdb.c
index bd7e4f9..5ea50ab 100644
--- a/bridge/fdb.c
+++ b/bridge/fdb.c
@@ -163,6 +163,8 @@ int print_fdb(const struct sockaddr_nl *who, struct 
nlmsghdr *n, void *arg)
fprintf(fp, "offload ");
 
fprintf(fp, "%s\n", state_n2a(r->ndm_state));
+   fflush(fp);
+
return 0;
 }
 
diff --git a/bridge/mdb.c b/bridge/mdb.c
index b14bd01..24c4903 100644
--- a/bridge/mdb.c
+++ b/bridge/mdb.c
@@ -135,6 +135,8 @@ int print_mdb(const struct sockaddr_nl *who, struct 
nlmsghdr *n, void *arg)
}
}
 
+   fflush(fp);
+
return 0;
 }
 
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH iproute2] ip route get: change exit to return to support batch commands

2015-10-15 Thread Roopa Prabhu

From: Roopa Prabhu 

replace exit with return -2 on rtnl_talk failure

Signed-off-by: Roopa Prabhu 
---
 ip/iproute.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ip/iproute.c b/ip/iproute.c
index da25548..b137f55 100644
--- a/ip/iproute.c
+++ b/ip/iproute.c
@@ -1643,7 +1643,7 @@ static int iproute_get(int argc, char **argv)
req.r.rtm_family = AF_INET;
 
if (rtnl_talk(&rth, &req.n, &req.n, sizeof(req)) < 0)
-   exit(2);
+   return -2;
 
if (connected && !from_ok) {
struct rtmsg *r = NLMSG_DATA(&req.n);
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next v4 2/2] mpls: flow-based multipath selection

2015-10-18 Thread Roopa Prabhu

From: Robert Shearman 

Change the selection of a multipath route to use a flow-based
hash. This more suitable for traffic sensitive to reordering within a
flow (e.g. TCP, L2VPN) and whilst still allowing a good distribution
of traffic given enough flows.

Selection of the path for a multipath route is done using a hash of:
1. Label stack up to MAX_MP_SELECT_LABELS labels or up to and
   including entropy label, whichever is first.
2. 3-tuple of (L3 src, L3 dst, proto) from IPv4/IPv6 header in MPLS
   payload, if present.

Naturally, a 5-tuple hash using L4 information in addition would be
possible and be better in some scenarios, but there is a tradeoff
between looking deeper into the packet to achieve good distribution,
and packet forwarding performance, and I have erred on the side of the
latter as the default.

Signed-off-by: Robert Shearman 
---
 net/mpls/af_mpls.c | 88 ++
 1 file changed, 83 insertions(+), 5 deletions(-)

diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index ebefdd4..79154f7 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -22,6 +22,11 @@
 #include 
 #include "internal.h"
 
+/* Maximum number of labels to look ahead at when selecting a path of
+ * a multipath route
+ */
+#define MAX_MP_SELECT_LABELS 4
+
 static int zero = 0;
 static int label_limit = (1 << 20) - 1;
 
@@ -77,10 +82,78 @@ bool mpls_pkt_too_big(const struct sk_buff *skb, unsigned 
int mtu)
 }
 EXPORT_SYMBOL_GPL(mpls_pkt_too_big);
 
-static struct mpls_nh *mpls_select_multipath(struct mpls_route *rt)
+static struct mpls_nh *mpls_select_multipath(struct mpls_route *rt,
+struct sk_buff *skb, bool bos)
 {
-   /* assume single nexthop for now */
-   return &rt->rt_nh[0];
+   struct mpls_entry_decoded dec;
+   struct mpls_shim_hdr *hdr;
+   bool eli_seen = false;
+   int label_index;
+   int nh_index = 0;
+   u32 hash = 0;
+
+   /* No need to look further into packet if there's only
+* one path
+*/
+   if (rt->rt_nhn == 1)
+   goto out;
+
+   for (label_index = 0; label_index < MAX_MP_SELECT_LABELS && !bos;
+label_index++) {
+   if (!pskb_may_pull(skb, sizeof(*hdr) * label_index))
+   break;
+
+   /* Read and decode the current label */
+   hdr = mpls_hdr(skb) + label_index;
+   dec = mpls_entry_decode(hdr);
+
+   /* RFC6790 - reserved labels MUST NOT be used as keys
+* for the load-balancing function
+*/
+   if (likely(dec.label >= MPLS_LABEL_FIRST_UNRESERVED)) {
+   hash = jhash_1word(dec.label, hash);
+
+   /* The entropy label follows the entropy label
+* indicator, so this means that the entropy
+* label was just added to the hash - no need to
+* go any deeper either in the label stack or in the
+* payload
+*/
+   if (eli_seen)
+   break;
+   } else if (dec.label == MPLS_LABEL_ENTROPY) {
+   eli_seen = true;
+   }
+
+   bos = dec.bos;
+   if (bos && pskb_may_pull(skb, sizeof(*hdr) * label_index +
+sizeof(struct iphdr))) {
+   const struct iphdr *v4hdr;
+
+   v4hdr = (const struct iphdr *)(mpls_hdr(skb) +
+  label_index);
+   if (v4hdr->version == 4) {
+   hash = jhash_3words(ntohl(v4hdr->saddr),
+   ntohl(v4hdr->daddr),
+   v4hdr->protocol, hash);
+   } else if (v4hdr->version == 6 &&
+   pskb_may_pull(skb, sizeof(*hdr) * label_index +
+ sizeof(struct ipv6hdr))) {
+   const struct ipv6hdr *v6hdr;
+
+   v6hdr = (const struct ipv6hdr *)(mpls_hdr(skb) +
+   label_index);
+
+   hash = __ipv6_addr_jhash(&v6hdr->saddr, hash);
+   hash = __ipv6_addr_jhash(&v6hdr->daddr, hash);
+   hash = jhash_1word(v6hdr->nexthdr, hash);
+   }
+   }
+   }
+
+   nh_index = hash % rt->rt_nhn;
+out:
+   return &rt->rt_nh[nh_index];
 }
 
 static bool mpls_egress(struct mpls_route *rt, struct sk_buff *skb,
@@ -145,7 +218,6 @@ static int mpls_forward(struct sk_buff *skb, struct 
net_device *dev,
unsigned int new_header_size;
unsigned int mtu;

[PATCH net-next v4 1/2] mpls: multipath route support

2015-10-18 Thread Roopa Prabhu

From: Roopa Prabhu 

This patch adds support for MPLS multipath routes.

Includes following changes to support multipath:
- splits struct mpls_route into 'struct mpls_route + struct mpls_nh'

- 'struct mpls_nh' represents a mpls nexthop label forwarding entry

- moves mpls route and nexthop structures into internal.h

- A mpls_route can point to multiple mpls_nh structs

- the nexthops are maintained as a array (similar to ipv4 fib)

- In the process of restructuring, this patch also consistently changes
  all labels to u8

- Adds support to parse/fill RTA_MULTIPATH netlink attribute for
multipath routes similar to ipv4/v6 fib

- In this patch, the multipath route nexthop selection algorithm
simply returns the first nexthop. It is replaced by a
hash based algorithm from Robert Shearman in the next patch

- mpls_route_update cleanup: remove 'dev' handling in mpls_route_update.
mpls_route_update though implemented to update based on dev, it was
never used that way. And the dev handling gets tricky with multiple nexthops.
Cannot match against any single nexthops dev. So, this patch removes the unused
'dev' handling in mpls_route_update.

Example:

$ip -f mpls route add 100 nexthop as 200 via inet 10.1.1.2 dev swp1 \
nexthop as 700 via inet 10.1.1.6 dev swp2 \
nexthop as 800 via inet 40.1.1.2 dev swp3

$ip  -f mpls route show
100
nexthop as to 200 via inet 10.1.1.2  dev swp1
nexthop as to 700 via inet 10.1.1.6  dev swp2
nexthop as to 800 via inet 40.1.1.2  dev swp3

Signed-off-by: Roopa Prabhu 
---
 include/net/mpls_iptunnel.h |   2 +-
 net/mpls/af_mpls.c  | 498 +++-
 net/mpls/internal.h |  52 -
 3 files changed, 403 insertions(+), 149 deletions(-)

diff --git a/include/net/mpls_iptunnel.h b/include/net/mpls_iptunnel.h
index 4757997..179253f 100644
--- a/include/net/mpls_iptunnel.h
+++ b/include/net/mpls_iptunnel.h
@@ -18,7 +18,7 @@
 
 struct mpls_iptunnel_encap {
u32 label[MAX_NEW_LABELS];
-   u32 labels;
+   u8  labels;
 };
 
 static inline struct mpls_iptunnel_encap *mpls_lwtunnel_encap(struct 
lwtunnel_state *lwtstate)
diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index bb185a2..ebefdd4 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -19,37 +19,9 @@
 #include 
 #include 
 #endif
+#include 
 #include "internal.h"
 
-#define LABEL_NOT_SPECIFIED (1<<20)
-#define MAX_NEW_LABELS 2
-
-/* This maximum ha length copied from the definition of struct neighbour */
-#define MAX_VIA_ALEN (ALIGN(MAX_ADDR_LEN, sizeof(unsigned long)))
-
-enum mpls_payload_type {
-   MPT_UNSPEC, /* IPv4 or IPv6 */
-   MPT_IPV4 = 4,
-   MPT_IPV6 = 6,
-
-   /* Other types not implemented:
-*  - Pseudo-wire with or without control word (RFC4385)
-*  - GAL (RFC5586)
-*/
-};
-
-struct mpls_route { /* next hop label forwarding entry */
-   struct net_device __rcu *rt_dev;
-   struct rcu_head rt_rcu;
-   u32 rt_label[MAX_NEW_LABELS];
-   u8  rt_protocol; /* routing protocol that set this 
entry */
-   u8  rt_payload_type;
-   u8  rt_labels;
-   u8  rt_via_alen;
-   u8  rt_via_table;
-   u8  rt_via[0];
-};
-
 static int zero = 0;
 static int label_limit = (1 << 20) - 1;
 
@@ -80,10 +52,10 @@ bool mpls_output_possible(const struct net_device *dev)
 }
 EXPORT_SYMBOL_GPL(mpls_output_possible);
 
-static unsigned int mpls_rt_header_size(const struct mpls_route *rt)
+static unsigned int mpls_nh_header_size(const struct mpls_nh *nh)
 {
/* The size of the layer 2.5 labels to be added for this route */
-   return rt->rt_labels * sizeof(struct mpls_shim_hdr);
+   return nh->nh_labels * sizeof(struct mpls_shim_hdr);
 }
 
 unsigned int mpls_dev_mtu(const struct net_device *dev)
@@ -105,6 +77,12 @@ bool mpls_pkt_too_big(const struct sk_buff *skb, unsigned 
int mtu)
 }
 EXPORT_SYMBOL_GPL(mpls_pkt_too_big);
 
+static struct mpls_nh *mpls_select_multipath(struct mpls_route *rt)
+{
+   /* assume single nexthop for now */
+   return &rt->rt_nh[0];
+}
+
 static bool mpls_egress(struct mpls_route *rt, struct sk_buff *skb,
struct mpls_entry_decoded dec)
 {
@@ -159,6 +137,7 @@ static int mpls_forward(struct sk_buff *skb, struct 
net_device *dev,
struct net *net = dev_net(dev);
struct mpls_shim_hdr *hdr;
struct mpls_route *rt;
+   struct mpls_nh *nh;
struct mpls_entry_decoded dec;
struct net_device *out_dev;
struct mpls_dev *mdev;
@@ -166,6 +145,7 @@ static int mpls_forward(struct sk_buff *skb, struct 
net_device *dev,
unsigned int new_header_size;
unsigned int mtu;
int err;
+   int nhid

[PATCH net-next v4 0/2] mpls: multipath support

2015-10-18 Thread Roopa Prabhu

From: Roopa Prabhu 

This patch adds support for MPLS multipath routes.

Includes following changes to support multipath:
- splits struct mpls_route into 'struct mpls_route + struct mpls_nh'.

- struct mpls_nh represents a mpls nexthop label forwarding entry

- Adds support to parse/fill RTA_MULTIPATH netlink attribute for
multipath routes similar to ipv4/v6 fib

- In the process of restructuring, this patch also consistently changes all
labels to u8

$ip -f mpls route add 100 nexthop as 200 via inet 10.1.1.2 dev swp1 \
nexthop as 700 via inet 10.1.1.6 dev swp2 \
nexthop as 800 via inet 40.1.1.2 dev swp3

$ip  -f mpls route show
100 
nexthop as to 200 via inet 10.1.1.2  dev swp1
nexthop as to 700 via inet 10.1.1.6  dev swp2
nexthop as to 800 via inet 40.1.1.2  dev swp3

Roopa Prabhu (1):
  mpls: multipath support

Robert Shearman (1):
  mpls: flow-based multipath selection

Signed-off-by: Roopa Prabhu 


v2:
- Incorporate some feedback from Robert:
use dynamic allocation (list) instead of static allocation
for nexthops
v3:
- Move back to arrays (same as v1), also suggested by Eric Biederman

v4:
- address a few comments from Eric Biederman
Plan to address the following pending comments in incremental patches 
after this
infrastructure changes go in.
- Move VIA size to 16 bytes
- use ipv6 flow label in ecmp calculations
- dead route handling during multipath route selection (I had planned 
this in
an incremental patch initially).

 include/net/mpls_iptunnel.h |   2 +-
 net/mpls/af_mpls.c  | 668 ++--
 net/mpls/internal.h |  57 +++-
 3 files changed, 572 insertions(+), 155 deletions(-)

-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next v5 0/2] mpls: multipath support

2015-10-22 Thread Roopa Prabhu

From: Roopa Prabhu 

This patch adds support for MPLS multipath routes.

Includes following changes to support multipath:
- splits struct mpls_route into 'struct mpls_route + struct mpls_nh'.

- struct mpls_nh represents a mpls nexthop label forwarding entry

- Adds support to parse/fill RTA_MULTIPATH netlink attribute for
multipath routes similar to ipv4/v6 fib

- In the process of restructuring, this patch also consistently changes all
labels to u8

$ip -f mpls route add 100 nexthop as 200 via inet 10.1.1.2 dev swp1 \
nexthop as 700 via inet 10.1.1.6 dev swp2 \
nexthop as 800 via inet 40.1.1.2 dev swp3

$ip  -f mpls route show
100 
nexthop as to 200 via inet 10.1.1.2  dev swp1
nexthop as to 700 via inet 10.1.1.6  dev swp2
nexthop as to 800 via inet 40.1.1.2  dev swp3

Roopa Prabhu (1):
  mpls: multipath support

Robert Shearman (1):
  mpls: flow-based multipath selection

Signed-off-by: Roopa Prabhu 


v2:
- Incorporate some feedback from Robert:
use dynamic allocation (list) instead of static allocation
for nexthops
v3:
- Move back to arrays (same as v1), also suggested by Eric Biederman

v4:
- address a few comments from Eric Biederman
Plan to address the following pending comments in incremental patches 
after this
infrastructure changes go in.
- Move VIA size to 16 bytes
- use ipv6 flow label in ecmp calculations
- dead route handling during multipath route selection (I had planned 
this in
an incremental patch initially).

v5:
feedback from Eric Biederman
- Removed some dead code
feedback from Robert
- Moved dev_put into find_outdev to make it clear that we dont 
need
a hold on the dev because we are under rtnl
- move the unused variable fix into the correct patch file

 include/net/mpls_iptunnel.h |   2 +-
 net/mpls/af_mpls.c  | 668 ++--
 net/mpls/internal.h |  57 +++-
 3 files changed, 572 insertions(+), 155 deletions(-)

-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next v5 1/2] mpls: multipath route support

2015-10-22 Thread Roopa Prabhu

From: Roopa Prabhu 

This patch adds support for MPLS multipath routes.

Includes following changes to support multipath:
- splits struct mpls_route into 'struct mpls_route + struct mpls_nh'

- 'struct mpls_nh' represents a mpls nexthop label forwarding entry

- moves mpls route and nexthop structures into internal.h

- A mpls_route can point to multiple mpls_nh structs

- the nexthops are maintained as a array (similar to ipv4 fib)

- In the process of restructuring, this patch also consistently changes
  all labels to u8

- Adds support to parse/fill RTA_MULTIPATH netlink attribute for
multipath routes similar to ipv4/v6 fib

- In this patch, the multipath route nexthop selection algorithm
simply returns the first nexthop. It is replaced by a
hash based algorithm from Robert Shearman in the next patch

- mpls_route_update cleanup: remove 'dev' handling in mpls_route_update.
mpls_route_update though implemented to update based on dev, it was
never used that way. And the dev handling gets tricky with multiple nexthops.
Cannot match against any single nexthops dev. So, this patch removes the unused
'dev' handling in mpls_route_update.

Example:

$ip -f mpls route add 100 nexthop as 200 via inet 10.1.1.2 dev swp1 \
nexthop as 700 via inet 10.1.1.6 dev swp2 \
nexthop as 800 via inet 40.1.1.2 dev swp3

$ip  -f mpls route show
100
nexthop as to 200 via inet 10.1.1.2  dev swp1
nexthop as to 700 via inet 10.1.1.6  dev swp2
nexthop as to 800 via inet 40.1.1.2  dev swp3

Signed-off-by: Roopa Prabhu 
---
 include/net/mpls_iptunnel.h |   2 +-
 net/mpls/af_mpls.c  | 493 +++-
 net/mpls/internal.h |  52 -
 3 files changed, 398 insertions(+), 149 deletions(-)

diff --git a/include/net/mpls_iptunnel.h b/include/net/mpls_iptunnel.h
index 4757997..179253f 100644
--- a/include/net/mpls_iptunnel.h
+++ b/include/net/mpls_iptunnel.h
@@ -18,7 +18,7 @@
 
 struct mpls_iptunnel_encap {
u32 label[MAX_NEW_LABELS];
-   u32 labels;
+   u8  labels;
 };
 
 static inline struct mpls_iptunnel_encap *mpls_lwtunnel_encap(struct 
lwtunnel_state *lwtstate)
diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index bb185a2..3f95499 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -19,37 +19,9 @@
 #include 
 #include 
 #endif
+#include 
 #include "internal.h"
 
-#define LABEL_NOT_SPECIFIED (1<<20)
-#define MAX_NEW_LABELS 2
-
-/* This maximum ha length copied from the definition of struct neighbour */
-#define MAX_VIA_ALEN (ALIGN(MAX_ADDR_LEN, sizeof(unsigned long)))
-
-enum mpls_payload_type {
-   MPT_UNSPEC, /* IPv4 or IPv6 */
-   MPT_IPV4 = 4,
-   MPT_IPV6 = 6,
-
-   /* Other types not implemented:
-*  - Pseudo-wire with or without control word (RFC4385)
-*  - GAL (RFC5586)
-*/
-};
-
-struct mpls_route { /* next hop label forwarding entry */
-   struct net_device __rcu *rt_dev;
-   struct rcu_head rt_rcu;
-   u32 rt_label[MAX_NEW_LABELS];
-   u8  rt_protocol; /* routing protocol that set this 
entry */
-   u8  rt_payload_type;
-   u8  rt_labels;
-   u8  rt_via_alen;
-   u8  rt_via_table;
-   u8  rt_via[0];
-};
-
 static int zero = 0;
 static int label_limit = (1 << 20) - 1;
 
@@ -80,10 +52,10 @@ bool mpls_output_possible(const struct net_device *dev)
 }
 EXPORT_SYMBOL_GPL(mpls_output_possible);
 
-static unsigned int mpls_rt_header_size(const struct mpls_route *rt)
+static unsigned int mpls_nh_header_size(const struct mpls_nh *nh)
 {
/* The size of the layer 2.5 labels to be added for this route */
-   return rt->rt_labels * sizeof(struct mpls_shim_hdr);
+   return nh->nh_labels * sizeof(struct mpls_shim_hdr);
 }
 
 unsigned int mpls_dev_mtu(const struct net_device *dev)
@@ -105,6 +77,12 @@ bool mpls_pkt_too_big(const struct sk_buff *skb, unsigned 
int mtu)
 }
 EXPORT_SYMBOL_GPL(mpls_pkt_too_big);
 
+static struct mpls_nh *mpls_select_multipath(struct mpls_route *rt)
+{
+   /* assume single nexthop for now */
+   return &rt->rt_nh[0];
+}
+
 static bool mpls_egress(struct mpls_route *rt, struct sk_buff *skb,
struct mpls_entry_decoded dec)
 {
@@ -159,6 +137,7 @@ static int mpls_forward(struct sk_buff *skb, struct 
net_device *dev,
struct net *net = dev_net(dev);
struct mpls_shim_hdr *hdr;
struct mpls_route *rt;
+   struct mpls_nh *nh;
struct mpls_entry_decoded dec;
struct net_device *out_dev;
struct mpls_dev *mdev;
@@ -196,8 +175,12 @@ static int mpls_forward(struct sk_buff *skb, struct 
net_device *dev,
if (!rt)
goto drop;
 
+   nh = mpls_select_multipath(rt);
+

[PATCH net-next v5 2/2] mpls: flow-based multipath selection

2015-10-22 Thread Roopa Prabhu

From: Robert Shearman 

Change the selection of a multipath route to use a flow-based
hash. This more suitable for traffic sensitive to reordering within a
flow (e.g. TCP, L2VPN) and whilst still allowing a good distribution
of traffic given enough flows.

Selection of the path for a multipath route is done using a hash of:
1. Label stack up to MAX_MP_SELECT_LABELS labels or up to and
   including entropy label, whichever is first.
2. 3-tuple of (L3 src, L3 dst, proto) from IPv4/IPv6 header in MPLS
   payload, if present.

Naturally, a 5-tuple hash using L4 information in addition would be
possible and be better in some scenarios, but there is a tradeoff
between looking deeper into the packet to achieve good distribution,
and packet forwarding performance, and I have erred on the side of the
latter as the default.

Signed-off-by: Robert Shearman 
---
 net/mpls/af_mpls.c | 87 +++---
 1 file changed, 83 insertions(+), 4 deletions(-)

diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index 3f95499..c6392aa 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -22,6 +22,11 @@
 #include 
 #include "internal.h"
 
+/* Maximum number of labels to look ahead at when selecting a path of
+ * a multipath route
+ */
+#define MAX_MP_SELECT_LABELS 4
+
 static int zero = 0;
 static int label_limit = (1 << 20) - 1;
 
@@ -77,10 +82,78 @@ bool mpls_pkt_too_big(const struct sk_buff *skb, unsigned 
int mtu)
 }
 EXPORT_SYMBOL_GPL(mpls_pkt_too_big);
 
-static struct mpls_nh *mpls_select_multipath(struct mpls_route *rt)
+static struct mpls_nh *mpls_select_multipath(struct mpls_route *rt,
+struct sk_buff *skb, bool bos)
 {
-   /* assume single nexthop for now */
-   return &rt->rt_nh[0];
+   struct mpls_entry_decoded dec;
+   struct mpls_shim_hdr *hdr;
+   bool eli_seen = false;
+   int label_index;
+   int nh_index = 0;
+   u32 hash = 0;
+
+   /* No need to look further into packet if there's only
+* one path
+*/
+   if (rt->rt_nhn == 1)
+   goto out;
+
+   for (label_index = 0; label_index < MAX_MP_SELECT_LABELS && !bos;
+label_index++) {
+   if (!pskb_may_pull(skb, sizeof(*hdr) * label_index))
+   break;
+
+   /* Read and decode the current label */
+   hdr = mpls_hdr(skb) + label_index;
+   dec = mpls_entry_decode(hdr);
+
+   /* RFC6790 - reserved labels MUST NOT be used as keys
+* for the load-balancing function
+*/
+   if (likely(dec.label >= MPLS_LABEL_FIRST_UNRESERVED)) {
+   hash = jhash_1word(dec.label, hash);
+
+   /* The entropy label follows the entropy label
+* indicator, so this means that the entropy
+* label was just added to the hash - no need to
+* go any deeper either in the label stack or in the
+* payload
+*/
+   if (eli_seen)
+   break;
+   } else if (dec.label == MPLS_LABEL_ENTROPY) {
+   eli_seen = true;
+   }
+
+   bos = dec.bos;
+   if (bos && pskb_may_pull(skb, sizeof(*hdr) * label_index +
+sizeof(struct iphdr))) {
+   const struct iphdr *v4hdr;
+
+   v4hdr = (const struct iphdr *)(mpls_hdr(skb) +
+  label_index);
+   if (v4hdr->version == 4) {
+   hash = jhash_3words(ntohl(v4hdr->saddr),
+   ntohl(v4hdr->daddr),
+   v4hdr->protocol, hash);
+   } else if (v4hdr->version == 6 &&
+   pskb_may_pull(skb, sizeof(*hdr) * label_index +
+ sizeof(struct ipv6hdr))) {
+   const struct ipv6hdr *v6hdr;
+
+   v6hdr = (const struct ipv6hdr *)(mpls_hdr(skb) +
+   label_index);
+
+   hash = __ipv6_addr_jhash(&v6hdr->saddr, hash);
+   hash = __ipv6_addr_jhash(&v6hdr->daddr, hash);
+   hash = jhash_1word(v6hdr->nexthdr, hash);
+   }
+   }
+   }
+
+   nh_index = hash % rt->rt_nhn;
+out:
+   return &rt->rt_nh[nh_index];
 }
 
 static bool mpls_egress(struct mpls_route *rt, struct sk_buff *skb,
@@ -175,7 +248,7 @@ static int mpls_forward(struct sk_buff *skb, struct 
net_device *dev,
if (!rt)
goto drop;
 
-   nh = mpls_select

[PATCH net-next v6 1/2] mpls: multipath route support

2015-10-23 Thread Roopa Prabhu

From: Roopa Prabhu 

This patch adds support for MPLS multipath routes.

Includes following changes to support multipath:
- splits struct mpls_route into 'struct mpls_route + struct mpls_nh'

- 'struct mpls_nh' represents a mpls nexthop label forwarding entry

- moves mpls route and nexthop structures into internal.h

- A mpls_route can point to multiple mpls_nh structs

- the nexthops are maintained as a array (similar to ipv4 fib)

- In the process of restructuring, this patch also consistently changes
  all labels to u8

- Adds support to parse/fill RTA_MULTIPATH netlink attribute for
multipath routes similar to ipv4/v6 fib

- In this patch, the multipath route nexthop selection algorithm
simply returns the first nexthop. It is replaced by a
hash based algorithm from Robert Shearman in the next patch

- mpls_route_update cleanup: remove 'dev' handling in mpls_route_update.
mpls_route_update though implemented to update based on dev, it was
never used that way. And the dev handling gets tricky with multiple
nexthops. Cannot match against any single nexthops dev. So, this patch
removes the unused 'dev' handling in mpls_route_update.

- dead route/path handling will be implemented in a subsequent patch

Example:

$ip -f mpls route add 100 nexthop as 200 via inet 10.1.1.2 dev swp1 \
nexthop as 700 via inet 10.1.1.6 dev swp2 \
nexthop as 800 via inet 40.1.1.2 dev swp3

$ip  -f mpls route show
100
nexthop as to 200 via inet 10.1.1.2  dev swp1
nexthop as to 700 via inet 10.1.1.6  dev swp2
nexthop as to 800 via inet 40.1.1.2  dev swp3

Signed-off-by: Roopa Prabhu 
Acked-by: Robert Shearman 
---
 include/net/mpls_iptunnel.h |   2 +-
 net/mpls/af_mpls.c  | 496 +++-
 net/mpls/internal.h |  52 -
 3 files changed, 401 insertions(+), 149 deletions(-)

diff --git a/include/net/mpls_iptunnel.h b/include/net/mpls_iptunnel.h
index 4757997..179253f 100644
--- a/include/net/mpls_iptunnel.h
+++ b/include/net/mpls_iptunnel.h
@@ -18,7 +18,7 @@
 
 struct mpls_iptunnel_encap {
u32 label[MAX_NEW_LABELS];
-   u32 labels;
+   u8  labels;
 };
 
 static inline struct mpls_iptunnel_encap *mpls_lwtunnel_encap(struct 
lwtunnel_state *lwtstate)
diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index bb185a2..ee3097a 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -19,37 +19,9 @@
 #include 
 #include 
 #endif
+#include 
 #include "internal.h"
 
-#define LABEL_NOT_SPECIFIED (1<<20)
-#define MAX_NEW_LABELS 2
-
-/* This maximum ha length copied from the definition of struct neighbour */
-#define MAX_VIA_ALEN (ALIGN(MAX_ADDR_LEN, sizeof(unsigned long)))
-
-enum mpls_payload_type {
-   MPT_UNSPEC, /* IPv4 or IPv6 */
-   MPT_IPV4 = 4,
-   MPT_IPV6 = 6,
-
-   /* Other types not implemented:
-*  - Pseudo-wire with or without control word (RFC4385)
-*  - GAL (RFC5586)
-*/
-};
-
-struct mpls_route { /* next hop label forwarding entry */
-   struct net_device __rcu *rt_dev;
-   struct rcu_head rt_rcu;
-   u32 rt_label[MAX_NEW_LABELS];
-   u8  rt_protocol; /* routing protocol that set this 
entry */
-   u8  rt_payload_type;
-   u8  rt_labels;
-   u8  rt_via_alen;
-   u8  rt_via_table;
-   u8  rt_via[0];
-};
-
 static int zero = 0;
 static int label_limit = (1 << 20) - 1;
 
@@ -80,10 +52,10 @@ bool mpls_output_possible(const struct net_device *dev)
 }
 EXPORT_SYMBOL_GPL(mpls_output_possible);
 
-static unsigned int mpls_rt_header_size(const struct mpls_route *rt)
+static unsigned int mpls_nh_header_size(const struct mpls_nh *nh)
 {
/* The size of the layer 2.5 labels to be added for this route */
-   return rt->rt_labels * sizeof(struct mpls_shim_hdr);
+   return nh->nh_labels * sizeof(struct mpls_shim_hdr);
 }
 
 unsigned int mpls_dev_mtu(const struct net_device *dev)
@@ -105,6 +77,12 @@ bool mpls_pkt_too_big(const struct sk_buff *skb, unsigned 
int mtu)
 }
 EXPORT_SYMBOL_GPL(mpls_pkt_too_big);
 
+static struct mpls_nh *mpls_select_multipath(struct mpls_route *rt)
+{
+   /* assume single nexthop for now */
+   return &rt->rt_nh[0];
+}
+
 static bool mpls_egress(struct mpls_route *rt, struct sk_buff *skb,
struct mpls_entry_decoded dec)
 {
@@ -159,6 +137,7 @@ static int mpls_forward(struct sk_buff *skb, struct 
net_device *dev,
struct net *net = dev_net(dev);
struct mpls_shim_hdr *hdr;
struct mpls_route *rt;
+   struct mpls_nh *nh;
struct mpls_entry_decoded dec;
struct net_device *out_dev;
struct mpls_dev *mdev;
@@ -196,8 +175,12 @@ static int mpls_forward(struct sk_buff *skb, struct

[PATCH net-next v6 0/2] mpls: multipath support

2015-10-23 Thread Roopa Prabhu

From: Roopa Prabhu 

This patch adds support for MPLS multipath routes.

Includes following changes to support multipath:
- splits struct mpls_route into 'struct mpls_route + struct mpls_nh'.

- struct mpls_nh represents a mpls nexthop label forwarding entry

- Adds support to parse/fill RTA_MULTIPATH netlink attribute for
multipath routes similar to ipv4/v6 fib

- In the process of restructuring, this patch also consistently changes all
labels to u8

$ip -f mpls route add 100 nexthop as 200 via inet 10.1.1.2 dev swp1 \
nexthop as 700 via inet 10.1.1.6 dev swp2 \
nexthop as 800 via inet 40.1.1.2 dev swp3

$ip  -f mpls route show
100 
nexthop as to 200 via inet 10.1.1.2  dev swp1
nexthop as to 700 via inet 10.1.1.6  dev swp2
nexthop as to 800 via inet 40.1.1.2  dev swp3

Roopa Prabhu (1):
  mpls: multipath support

Robert Shearman (1):
  mpls: flow-based multipath selection

Signed-off-by: Roopa Prabhu 
Acked-by: Eric W. Biederman 


v2:
- Incorporate some feedback from Robert:
use dynamic allocation (list) instead of static allocation
for nexthops
v3:
- Move back to arrays (same as v1), also suggested by Eric Biederman

v4:
- address a few comments from Eric Biederman
Plan to address the following pending comments in incremental patches 
after this
infrastructure changes go in.
- Move VIA size to 16 bytes
- use ipv6 flow label in ecmp calculations
- dead route handling during multipath route selection (I had planned 
this in
an incremental patch initially).

v5:
feedback from Eric Biederman
- Removed some dead code
feedback from Robert
- Moved dev_put into find_outdev to make it clear that we dont 
need
a hold on the dev because we are under rtnl
- move the unused variable fix into the correct patch file

v6:
- fix checkpatch errors
- Still see one pending checkpatch error "ERROR: Macros with
complex values should be enclosed in parentheses
#859: FILE: net/mpls/internal.h:70:
+#define endfor_nexthops(rt) } "

I picked this macro from ip_fib.h and there are other places in
the kernel that define the same macro. See some discussions
on mailing lists that this could be spurious.

 include/net/mpls_iptunnel.h |   2 +-
 net/mpls/af_mpls.c  | 668 ++--
 net/mpls/internal.h |  57 +++-
 3 files changed, 572 insertions(+), 155 deletions(-)

-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next v6 2/2] mpls: flow-based multipath selection

2015-10-23 Thread Roopa Prabhu

From: Robert Shearman 

Change the selection of a multipath route to use a flow-based
hash. This more suitable for traffic sensitive to reordering within a
flow (e.g. TCP, L2VPN) and whilst still allowing a good distribution
of traffic given enough flows.

Selection of the path for a multipath route is done using a hash of:
1. Label stack up to MAX_MP_SELECT_LABELS labels or up to and
   including entropy label, whichever is first.
2. 3-tuple of (L3 src, L3 dst, proto) from IPv4/IPv6 header in MPLS
   payload, if present.

Naturally, a 5-tuple hash using L4 information in addition would be
possible and be better in some scenarios, but there is a tradeoff
between looking deeper into the packet to achieve good distribution,
and packet forwarding performance, and I have erred on the side of the
latter as the default.

Signed-off-by: Robert Shearman 
Signed-off-by: Roopa Prabhu 
---
 net/mpls/af_mpls.c | 87 +++---
 1 file changed, 83 insertions(+), 4 deletions(-)

diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index ee3097a..cc972e3 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -22,6 +22,11 @@
 #include 
 #include "internal.h"
 
+/* Maximum number of labels to look ahead at when selecting a path of
+ * a multipath route
+ */
+#define MAX_MP_SELECT_LABELS 4
+
 static int zero = 0;
 static int label_limit = (1 << 20) - 1;
 
@@ -77,10 +82,78 @@ bool mpls_pkt_too_big(const struct sk_buff *skb, unsigned 
int mtu)
 }
 EXPORT_SYMBOL_GPL(mpls_pkt_too_big);
 
-static struct mpls_nh *mpls_select_multipath(struct mpls_route *rt)
+static struct mpls_nh *mpls_select_multipath(struct mpls_route *rt,
+struct sk_buff *skb, bool bos)
 {
-   /* assume single nexthop for now */
-   return &rt->rt_nh[0];
+   struct mpls_entry_decoded dec;
+   struct mpls_shim_hdr *hdr;
+   bool eli_seen = false;
+   int label_index;
+   int nh_index = 0;
+   u32 hash = 0;
+
+   /* No need to look further into packet if there's only
+* one path
+*/
+   if (rt->rt_nhn == 1)
+   goto out;
+
+   for (label_index = 0; label_index < MAX_MP_SELECT_LABELS && !bos;
+label_index++) {
+   if (!pskb_may_pull(skb, sizeof(*hdr) * label_index))
+   break;
+
+   /* Read and decode the current label */
+   hdr = mpls_hdr(skb) + label_index;
+   dec = mpls_entry_decode(hdr);
+
+   /* RFC6790 - reserved labels MUST NOT be used as keys
+* for the load-balancing function
+*/
+   if (likely(dec.label >= MPLS_LABEL_FIRST_UNRESERVED)) {
+   hash = jhash_1word(dec.label, hash);
+
+   /* The entropy label follows the entropy label
+* indicator, so this means that the entropy
+* label was just added to the hash - no need to
+* go any deeper either in the label stack or in the
+* payload
+*/
+   if (eli_seen)
+   break;
+   } else if (dec.label == MPLS_LABEL_ENTROPY) {
+   eli_seen = true;
+   }
+
+   bos = dec.bos;
+   if (bos && pskb_may_pull(skb, sizeof(*hdr) * label_index +
+sizeof(struct iphdr))) {
+   const struct iphdr *v4hdr;
+
+   v4hdr = (const struct iphdr *)(mpls_hdr(skb) +
+  label_index);
+   if (v4hdr->version == 4) {
+   hash = jhash_3words(ntohl(v4hdr->saddr),
+   ntohl(v4hdr->daddr),
+   v4hdr->protocol, hash);
+   } else if (v4hdr->version == 6 &&
+   pskb_may_pull(skb, sizeof(*hdr) * label_index +
+ sizeof(struct ipv6hdr))) {
+   const struct ipv6hdr *v6hdr;
+
+   v6hdr = (const struct ipv6hdr *)(mpls_hdr(skb) +
+   label_index);
+
+   hash = __ipv6_addr_jhash(&v6hdr->saddr, hash);
+   hash = __ipv6_addr_jhash(&v6hdr->daddr, hash);
+   hash = jhash_1word(v6hdr->nexthdr, hash);
+   }
+   }
+   }
+
+   nh_index = hash % rt->rt_nhn;
+out:
+   return &rt->rt_nh[nh_index];
 }
 
 static bool mpls_egress(struct mpls_route *rt, struct sk_buff *skb,
@@ -175,7 +248,7 @@ stat

[PATCH net-next] bridge: set is_local and is_static before fdb entry is added to the fdb hashtable

2015-10-26 Thread Roopa Prabhu

From: Roopa Prabhu 

Problem Description:
We can add fdbs pointing to the bridge with NULL ->dst but that has a
few race conditions because br_fdb_insert() is used which first creates
the fdb and then, after the fdb has been published/linked, sets
"is_local" to 1 and in that time frame if a packet arrives for that fdb
it may see it as non-local and either do a NULL ptr dereference in
br_forward() or attach the fdb to the port where it arrived, and later
br_fdb_insert() will make it local thus getting a wrong fdb entry.
Call chain br_handle_frame_finish() -> br_forward():
But in br_handle_frame_finish() in order to call br_forward() the dst
should not be local i.e. skb != NULL, whenever the dst is
found to be local skb is set to NULL so we can't forward it,
and here comes the problem since it's running only
with RCU when forwarding packets it can see the entry before "is_local"
is set to 1 and actually try to dereference NULL.
The main issue is that if someone sends a packet to the switch while
it's adding the entry which points to the bridge device, it may
dereference NULL ptr. This is needed now after we can add fdbs
pointing to the bridge.  This poses a problem for
br_fdb_update() as well, while someone's adding a bridge fdb, but
before it has is_local == 1, it might get moved to a port if it comes
as a source mac and then it may get its "is_local" set to 1

This patch changes fdb_create to take is_local and is_static as
arguments to set these values in the fdb entry before it is added to the
hash. Also adds null check for port in br_forward.

Reported-by: Nikolay Aleksandrov 
Signed-off-by: Roopa Prabhu 
---
 net/bridge/br_fdb.c | 15 ---
 net/bridge/br_forward.c |  2 +-
 2 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c
index c88bd8e..35a1c7e 100644
--- a/net/bridge/br_fdb.c
+++ b/net/bridge/br_fdb.c
@@ -495,7 +495,9 @@ static struct net_bridge_fdb_entry *fdb_find_rcu(struct 
hlist_head *head,
 static struct net_bridge_fdb_entry *fdb_create(struct hlist_head *head,
   struct net_bridge_port *source,
   const unsigned char *addr,
-  __u16 vid)
+  __u16 vid,
+  unsigned char is_local,
+  unsigned char is_static)
 {
struct net_bridge_fdb_entry *fdb;
 
@@ -504,8 +506,8 @@ static struct net_bridge_fdb_entry *fdb_create(struct 
hlist_head *head,
memcpy(fdb->addr.addr, addr, ETH_ALEN);
fdb->dst = source;
fdb->vlan_id = vid;
-   fdb->is_local = 0;
-   fdb->is_static = 0;
+   fdb->is_local = is_local;
+   fdb->is_static = is_static;
fdb->added_by_user = 0;
fdb->added_by_external_learn = 0;
fdb->updated = fdb->used = jiffies;
@@ -536,11 +538,10 @@ static int fdb_insert(struct net_bridge *br, struct 
net_bridge_port *source,
fdb_delete(br, fdb);
}
 
-   fdb = fdb_create(head, source, addr, vid);
+   fdb = fdb_create(head, source, addr, vid, 1, 1);
if (!fdb)
return -ENOMEM;
 
-   fdb->is_local = fdb->is_static = 1;
fdb_add_hw_addr(br, addr);
fdb_notify(br, fdb, RTM_NEWNEIGH);
return 0;
@@ -597,7 +598,7 @@ void br_fdb_update(struct net_bridge *br, struct 
net_bridge_port *source,
} else {
spin_lock(&br->hash_lock);
if (likely(!fdb_find(head, addr, vid))) {
-   fdb = fdb_create(head, source, addr, vid);
+   fdb = fdb_create(head, source, addr, vid, 0, 0);
if (fdb) {
if (unlikely(added_by_user))
fdb->added_by_user = 1;
@@ -774,7 +775,7 @@ static int fdb_add_entry(struct net_bridge_port *source, 
const __u8 *addr,
if (!(flags & NLM_F_CREATE))
return -ENOENT;
 
-   fdb = fdb_create(head, source, addr, vid);
+   fdb = fdb_create(head, source, addr, vid, 0, 0);
if (!fdb)
return -ENOMEM;
 
diff --git a/net/bridge/br_forward.c b/net/bridge/br_forward.c
index a9d424e..fcdb86d 100644
--- a/net/bridge/br_forward.c
+++ b/net/bridge/br_forward.c
@@ -141,7 +141,7 @@ EXPORT_SYMBOL_GPL(br_deliver);
 /* called with rcu_read_lock */
 void br_forward(const struct net_bridge_port *to, struct sk_buff *skb, struct 
sk_buff *skb0)
 {
-   if (should_deliver(to, skb)) {
+   if (to && should_deliver(to, skb)) {
if (skb0)
deliver_clone(to, sk

[PATCH net-next v2] bridge: set is_local and is_static before fdb entry is added to the fdb hashtable

2015-10-27 Thread Roopa Prabhu

From: Roopa Prabhu 

Problem Description:
We can add fdbs pointing to the bridge with NULL ->dst but that has a
few race conditions because br_fdb_insert() is used which first creates
the fdb and then, after the fdb has been published/linked, sets
"is_local" to 1 and in that time frame if a packet arrives for that fdb
it may see it as non-local and either do a NULL ptr dereference in
br_forward() or attach the fdb to the port where it arrived, and later
br_fdb_insert() will make it local thus getting a wrong fdb entry.
Call chain br_handle_frame_finish() -> br_forward():
But in br_handle_frame_finish() in order to call br_forward() the dst
should not be local i.e. skb != NULL, whenever the dst is
found to be local skb is set to NULL so we can't forward it,
and here comes the problem since it's running only
with RCU when forwarding packets it can see the entry before "is_local"
is set to 1 and actually try to dereference NULL.
The main issue is that if someone sends a packet to the switch while
it's adding the entry which points to the bridge device, it may
dereference NULL ptr. This is needed now after we can add fdbs
pointing to the bridge.  This poses a problem for
br_fdb_update() as well, while someone's adding a bridge fdb, but
before it has is_local == 1, it might get moved to a port if it comes
as a source mac and then it may get its "is_local" set to 1

This patch changes fdb_create to take is_local and is_static as
arguments to set these values in the fdb entry before it is added to the
hash. Also adds null check for port in br_forward.

Reported-by: Nikolay Aleksandrov 
Signed-off-by: Roopa Prabhu 
---
v2 - fix compile error reported by kbuild test robot.
 Accidently i had posted an older version of the patch

 net/bridge/br_fdb.c | 15 ---
 net/bridge/br_forward.c |  2 +-
 2 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c
index c88bd8e..35a1c7e 100644
--- a/net/bridge/br_fdb.c
+++ b/net/bridge/br_fdb.c
@@ -495,7 +495,9 @@ static struct net_bridge_fdb_entry *fdb_find_rcu(struct 
hlist_head *head,
 static struct net_bridge_fdb_entry *fdb_create(struct hlist_head *head,
   struct net_bridge_port *source,
   const unsigned char *addr,
-  __u16 vid)
+  __u16 vid,
+  unsigned char is_local,
+  unsigned char is_static)
 {
struct net_bridge_fdb_entry *fdb;
 
@@ -504,8 +506,8 @@ static struct net_bridge_fdb_entry *fdb_create(struct 
hlist_head *head,
memcpy(fdb->addr.addr, addr, ETH_ALEN);
fdb->dst = source;
fdb->vlan_id = vid;
-   fdb->is_local = 0;
-   fdb->is_static = 0;
+   fdb->is_local = is_local;
+   fdb->is_static = is_static;
fdb->added_by_user = 0;
fdb->added_by_external_learn = 0;
fdb->updated = fdb->used = jiffies;
@@ -536,11 +538,10 @@ static int fdb_insert(struct net_bridge *br, struct 
net_bridge_port *source,
fdb_delete(br, fdb);
}
 
-   fdb = fdb_create(head, source, addr, vid);
+   fdb = fdb_create(head, source, addr, vid, 1, 1);
if (!fdb)
return -ENOMEM;
 
-   fdb->is_local = fdb->is_static = 1;
fdb_add_hw_addr(br, addr);
fdb_notify(br, fdb, RTM_NEWNEIGH);
return 0;
@@ -597,7 +598,7 @@ void br_fdb_update(struct net_bridge *br, struct 
net_bridge_port *source,
} else {
spin_lock(&br->hash_lock);
if (likely(!fdb_find(head, addr, vid))) {
-   fdb = fdb_create(head, source, addr, vid);
+   fdb = fdb_create(head, source, addr, vid, 0, 0);
if (fdb) {
if (unlikely(added_by_user))
fdb->added_by_user = 1;
@@ -774,7 +775,7 @@ static int fdb_add_entry(struct net_bridge_port *source, 
const __u8 *addr,
if (!(flags & NLM_F_CREATE))
return -ENOENT;
 
-   fdb = fdb_create(head, source, addr, vid);
+   fdb = fdb_create(head, source, addr, vid, 0, 0);
if (!fdb)
return -ENOMEM;
 
diff --git a/net/bridge/br_forward.c b/net/bridge/br_forward.c
index a9d424e..fcdb86d 100644
--- a/net/bridge/br_forward.c
+++ b/net/bridge/br_forward.c
@@ -141,7 +141,7 @@ EXPORT_SYMBOL_GPL(br_deliver);
 /* called with rcu_read_lock */
 void br_forward(const struct net_bridge_port *to, struct sk_buff *skb, struct 
sk_buff *skb0)
 {
-   if (should_deliver(to, skb)) {
+

[PATCH net-next v3] bridge: set is_local and is_static before fdb entry is added to the fdb hashtable

2015-10-27 Thread Roopa Prabhu

From: Roopa Prabhu 

Problem Description:
We can add fdbs pointing to the bridge with NULL ->dst but that has a
few race conditions because br_fdb_insert() is used which first creates
the fdb and then, after the fdb has been published/linked, sets
"is_local" to 1 and in that time frame if a packet arrives for that fdb
it may see it as non-local and either do a NULL ptr dereference in
br_forward() or attach the fdb to the port where it arrived, and later
br_fdb_insert() will make it local thus getting a wrong fdb entry.
Call chain br_handle_frame_finish() -> br_forward():
But in br_handle_frame_finish() in order to call br_forward() the dst
should not be local i.e. skb != NULL, whenever the dst is
found to be local skb is set to NULL so we can't forward it,
and here comes the problem since it's running only
with RCU when forwarding packets it can see the entry before "is_local"
is set to 1 and actually try to dereference NULL.
The main issue is that if someone sends a packet to the switch while
it's adding the entry which points to the bridge device, it may
dereference NULL ptr. This is needed now after we can add fdbs
pointing to the bridge.  This poses a problem for
br_fdb_update() as well, while someone's adding a bridge fdb, but
before it has is_local == 1, it might get moved to a port if it comes
as a source mac and then it may get its "is_local" set to 1

This patch changes fdb_create to take is_local and is_static as
arguments to set these values in the fdb entry before it is added to the
hash. Also adds null check for port in br_forward.

Fixes: 3741873b4f73 ("bridge: allow adding of fdb entries pointing to the 
bridge device")
Reported-by: Nikolay Aleksandrov 
Signed-off-by: Roopa Prabhu 
Reviewed-by: Nikolay Aleksandrov 
---
v1 - v2 : fix compilation error reported by kbuild robot

v2 - v3 : really fix the compilation error :( 

 net/bridge/br_fdb.c | 17 +
 net/bridge/br_forward.c |  2 +-
 2 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c
index c88bd8e..a642bb8 100644
--- a/net/bridge/br_fdb.c
+++ b/net/bridge/br_fdb.c
@@ -495,7 +495,9 @@ static struct net_bridge_fdb_entry *fdb_find_rcu(struct 
hlist_head *head,
 static struct net_bridge_fdb_entry *fdb_create(struct hlist_head *head,
   struct net_bridge_port *source,
   const unsigned char *addr,
-  __u16 vid)
+  __u16 vid,
+  unsigned char is_local,
+  unsigned char is_static)
 {
struct net_bridge_fdb_entry *fdb;
 
@@ -504,8 +506,8 @@ static struct net_bridge_fdb_entry *fdb_create(struct 
hlist_head *head,
memcpy(fdb->addr.addr, addr, ETH_ALEN);
fdb->dst = source;
fdb->vlan_id = vid;
-   fdb->is_local = 0;
-   fdb->is_static = 0;
+   fdb->is_local = is_local;
+   fdb->is_static = is_static;
fdb->added_by_user = 0;
fdb->added_by_external_learn = 0;
fdb->updated = fdb->used = jiffies;
@@ -536,11 +538,10 @@ static int fdb_insert(struct net_bridge *br, struct 
net_bridge_port *source,
fdb_delete(br, fdb);
}
 
-   fdb = fdb_create(head, source, addr, vid);
+   fdb = fdb_create(head, source, addr, vid, 1, 1);
if (!fdb)
return -ENOMEM;
 
-   fdb->is_local = fdb->is_static = 1;
fdb_add_hw_addr(br, addr);
fdb_notify(br, fdb, RTM_NEWNEIGH);
return 0;
@@ -597,7 +598,7 @@ void br_fdb_update(struct net_bridge *br, struct 
net_bridge_port *source,
} else {
spin_lock(&br->hash_lock);
if (likely(!fdb_find(head, addr, vid))) {
-   fdb = fdb_create(head, source, addr, vid);
+   fdb = fdb_create(head, source, addr, vid, 0, 0);
if (fdb) {
if (unlikely(added_by_user))
fdb->added_by_user = 1;
@@ -774,7 +775,7 @@ static int fdb_add_entry(struct net_bridge_port *source, 
const __u8 *addr,
if (!(flags & NLM_F_CREATE))
return -ENOENT;
 
-   fdb = fdb_create(head, source, addr, vid);
+   fdb = fdb_create(head, source, addr, vid, 0, 0);
if (!fdb)
return -ENOMEM;
 
@@ -1099,7 +1100,7 @@ int br_fdb_external_learn_add(struct net_bridge *br, 
struct net_bridge_port *p,
head = &br->hash[br_mac_hash(addr, vid)];
fdb = fdb_find(head, addr, vid);
if (!fdb) {
-

[PATCH net-next RFC] mpls: support for dead routes

2015-10-29 Thread Roopa Prabhu

From: Roopa Prabhu 

Adds support for both RTNH_F_DEAD and RTNH_F_LINKDOWN flags.
This resembles ipv4 fib code. I also picked fib_rebalance from
ipv4. Enabled weights support for nexthop, just because the
infrastructure is already there.

Signed-off-by: Roopa Prabhu 
---
I want to get this in before net-next closes as promised.
I have tested it for the dead/linkdown flags. The multipath selection
and hash calculation in the face of dead routes needs some more
work. I am short on cycles this week and thought of getting some 
early feedback. Hence sending this out as RFC. I will continue with some
more testing.  Robert, I am using your hash algo but it needs some more
work with dead routes. If you already have any thoughts on this, i will
take them. thanks!.


 net/mpls/af_mpls.c  | 228 +---
 net/mpls/internal.h |   4 +
 2 files changed, 202 insertions(+), 30 deletions(-)

diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index c70d750..7db9678 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -27,6 +27,8 @@
  */
 #define MAX_MP_SELECT_LABELS 4
 
+u32 mpls_multipath_secret __read_mostly;
+
 static int zero = 0;
 static int label_limit = (1 << 20) - 1;
 
@@ -96,22 +98,52 @@ bool mpls_pkt_too_big(const struct sk_buff *skb, unsigned 
int mtu)
 }
 EXPORT_SYMBOL_GPL(mpls_pkt_too_big);
 
-static struct mpls_nh *mpls_select_multipath(struct mpls_route *rt,
-struct sk_buff *skb, bool bos)
+static void mpls_multipath_rebalance(struct mpls_route *rt)
+{
+   int total;
+   int w;
+
+   if (rt->rt_nhn < 2)
+   return;
+
+   total = 0;
+   for_nexthops(rt) {
+   if ((nh->nh_flags & RTNH_F_DEAD) ||
+   (nh->nh_flags & RTNH_F_LINKDOWN))
+   continue;
+
+   total += nh->nh_weight;
+   } endfor_nexthops(rt);
+
+   w = 0;
+   change_nexthops(rt) {
+   int upper_bound;
+
+   if ((nh->nh_flags & RTNH_F_DEAD) ||
+   (nh->nh_flags & RTNH_F_LINKDOWN)) {
+   upper_bound = -1;
+   } else {
+   w += nh->nh_weight;
+   upper_bound = DIV_ROUND_CLOSEST_ULL((u64)w << 31,
+   total) - 1;
+   }
+
+   atomic_set(&nh->nh_upper_bound, upper_bound);
+   } endfor_nexthops(rt);
+
+   net_get_random_once(&mpls_multipath_secret,
+   sizeof(mpls_multipath_secret));
+}
+
+static u32 mpls_multipath_hash(struct mpls_route *rt,
+  struct sk_buff *skb, bool bos)
 {
struct mpls_entry_decoded dec;
struct mpls_shim_hdr *hdr;
bool eli_seen = false;
int label_index;
-   int nh_index = 0;
u32 hash = 0;
 
-   /* No need to look further into packet if there's only
-* one path
-*/
-   if (rt->rt_nhn == 1)
-   goto out;
-
for (label_index = 0; label_index < MAX_MP_SELECT_LABELS && !bos;
 label_index++) {
if (!pskb_may_pull(skb, sizeof(*hdr) * label_index))
@@ -165,9 +197,29 @@ static struct mpls_nh *mpls_select_multipath(struct 
mpls_route *rt,
}
}
 
-   nh_index = hash % rt->rt_nhn;
+   return hash;
+}
+
+static struct mpls_nh *mpls_select_multipath(struct mpls_route *rt,
+struct sk_buff *skb, bool bos)
+{
+   u32 hash = 0;
+
+   /* No need to look further into packet if there's only
+* one path
+*/
+   if (rt->rt_nhn == 1)
+   goto out;
+
+   hash = mpls_multipath_hash(rt, skb, bos);
+   for_nexthops(rt) {
+   if (hash > atomic_read(&nh->nh_upper_bound))
+   continue;
+   return nh;
+   } endfor_nexthops(rt);
+
 out:
-   return &rt->rt_nh[nh_index];
+   return &rt->rt_nh[0];
 }
 
 static bool mpls_egress(struct mpls_route *rt, struct sk_buff *skb,
@@ -577,7 +629,7 @@ errout:
 }
 
 static int mpls_nh_build(struct net *net, struct mpls_route *rt,
-struct mpls_nh *nh, int oif,
+struct mpls_nh *nh, int oif, int hops,
 struct nlattr *via, struct nlattr *newdst)
 {
int err = -ENOMEM;
@@ -597,6 +649,7 @@ static int mpls_nh_build(struct net *net, struct mpls_route 
*rt,
if (err)
goto errout;
 
+   nh->nh_weight = hops + 1;
err = mpls_nh_assign_dev(net, rt, nh, oif);
if (err)
goto errout;
@@ -663,10 +716,9 @@ static int mpls_nh_build_multi(struct mpls_route_config 
*cfg,
if (!rtnh_ok(rtnh, remaining))
goto errout;
 
-

[PATCH net-next] mpls: support for dead routes

2015-11-02 Thread Roopa Prabhu

From: Roopa Prabhu 

Adds support for RTNH_F_DEAD and RTNH_F_LINKDOWN flags on mpls
routes due to link events. Also adds code to ignore dead
routes during route selection

Signed-off-by: Roopa Prabhu 
---
RFC to v1:
Addressed a few comments from Eric and Robert:
- remove support for weighted nexthops
- Use rt_nhn_alive in the rt structure to keep count of alive
routes.
What i have not done is: sort nexthops on link events.
I am not comfortable recreating or sorting nexthops on
every carrier change. This leaves scope for optimizing in the future


 net/mpls/af_mpls.c  | 193 
 net/mpls/internal.h |   3 +
 2 files changed, 168 insertions(+), 28 deletions(-)

diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index c70d750..6b62fcc 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -96,22 +96,15 @@ bool mpls_pkt_too_big(const struct sk_buff *skb, unsigned 
int mtu)
 }
 EXPORT_SYMBOL_GPL(mpls_pkt_too_big);
 
-static struct mpls_nh *mpls_select_multipath(struct mpls_route *rt,
-struct sk_buff *skb, bool bos)
+static u32 mpls_multipath_hash(struct mpls_route *rt,
+  struct sk_buff *skb, bool bos)
 {
struct mpls_entry_decoded dec;
struct mpls_shim_hdr *hdr;
bool eli_seen = false;
int label_index;
-   int nh_index = 0;
u32 hash = 0;
 
-   /* No need to look further into packet if there's only
-* one path
-*/
-   if (rt->rt_nhn == 1)
-   goto out;
-
for (label_index = 0; label_index < MAX_MP_SELECT_LABELS && !bos;
 label_index++) {
if (!pskb_may_pull(skb, sizeof(*hdr) * label_index))
@@ -165,9 +158,38 @@ static struct mpls_nh *mpls_select_multipath(struct 
mpls_route *rt,
}
}
 
-   nh_index = hash % rt->rt_nhn;
+   return hash;
+}
+
+static struct mpls_nh *mpls_select_multipath(struct mpls_route *rt,
+struct sk_buff *skb, bool bos)
+{
+   u32 hash = 0;
+   int nh_index;
+   int n = 0;
+
+   /* No need to look further into packet if there's only
+* one path
+*/
+   if (rt->rt_nhn == 1)
+   goto out;
+
+   if (rt->rt_nhn_alive <= 0)
+   return NULL;
+
+   hash = mpls_multipath_hash(rt, skb, bos);
+   nh_index = hash % rt->rt_nhn_alive;
+   for_nexthops(rt) {
+   if ((nh->nh_flags & RTNH_F_DEAD) ||
+   (nh->nh_flags & RTNH_F_LINKDOWN))
+   continue;
+   if (n == nh_index)
+   return nh;
+   n++;
+   } endfor_nexthops(rt);
+
 out:
-   return &rt->rt_nh[nh_index];
+   return &rt->rt_nh[0];
 }
 
 static bool mpls_egress(struct mpls_route *rt, struct sk_buff *skb,
@@ -365,6 +387,7 @@ static struct mpls_route *mpls_rt_alloc(int num_nh, u8 
max_alen)
 GFP_KERNEL);
if (rt) {
rt->rt_nhn = num_nh;
+   rt->rt_nhn_alive = num_nh;
rt->rt_max_alen = max_alen_aligned;
}
 
@@ -536,6 +559,16 @@ static int mpls_nh_assign_dev(struct net *net, struct 
mpls_route *rt,
 
RCU_INIT_POINTER(nh->nh_dev, dev);
 
+   if (!netif_carrier_ok(dev))
+   nh->nh_flags |= RTNH_F_LINKDOWN;
+
+   if (!(dev->flags & IFF_UP))
+   nh->nh_flags |= RTNH_F_DEAD;
+
+   if ((nh->nh_flags & RTNH_F_LINKDOWN) ||
+   (nh->nh_flags & RTNH_F_DEAD))
+   rt->rt_nhn_alive--;
+
return 0;
 
 errout:
@@ -577,7 +610,7 @@ errout:
 }
 
 static int mpls_nh_build(struct net *net, struct mpls_route *rt,
-struct mpls_nh *nh, int oif,
+struct mpls_nh *nh, int oif, int hops,
 struct nlattr *via, struct nlattr *newdst)
 {
int err = -ENOMEM;
@@ -666,7 +699,7 @@ static int mpls_nh_build_multi(struct mpls_route_config 
*cfg,
/* neither weighted multipath nor any flags
 * are supported
 */
-   if (rtnh->rtnh_hops || rtnh->rtnh_flags)
+   if (rtnh->rtnh_flags || rtnh->rtnh_flags)
goto errout;
 
attrlen = rtnh_attrlen(rtnh);
@@ -681,8 +714,8 @@ static int mpls_nh_build_multi(struct mpls_route_config 
*cfg,
goto errout;
 
err = mpls_nh_build(cfg->rc_nlinfo.nl_net, rt, nh,
-   rtnh->rtnh_ifindex, nla_via,
-   nla_newdst);
+   rtnh->rtnh_ifindex, rtnh->rtnh_hops,
+   nla_via, nla_newdst);
if

[PATCH net-next v2] mpls: support for dead routes

2015-11-02 Thread Roopa Prabhu

From: Roopa Prabhu 

Adds support for RTNH_F_DEAD and RTNH_F_LINKDOWN flags on mpls
routes due to link events. Also adds code to ignore dead
routes during route selection

Signed-off-by: Roopa Prabhu 
---
RFC to v1:
Addressed a few comments from Eric and Robert:
- remove support for weighted nexthops
- Use rt_nhn_alive in the rt structure to keep count of alive
routes.
What i have not done is: sort nexthops on link events.
I am not comfortable recreating or sorting nexthops on
every carrier change. This leaves scope for optimizing in the future

v1 to v2:
Fix dead nexthop checks as suggested by dave

 net/mpls/af_mpls.c  | 191 
 net/mpls/internal.h |   3 +
 2 files changed, 166 insertions(+), 28 deletions(-)

diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index c70d750..5e88118 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -96,22 +96,15 @@ bool mpls_pkt_too_big(const struct sk_buff *skb, unsigned 
int mtu)
 }
 EXPORT_SYMBOL_GPL(mpls_pkt_too_big);
 
-static struct mpls_nh *mpls_select_multipath(struct mpls_route *rt,
-struct sk_buff *skb, bool bos)
+static u32 mpls_multipath_hash(struct mpls_route *rt,
+  struct sk_buff *skb, bool bos)
 {
struct mpls_entry_decoded dec;
struct mpls_shim_hdr *hdr;
bool eli_seen = false;
int label_index;
-   int nh_index = 0;
u32 hash = 0;
 
-   /* No need to look further into packet if there's only
-* one path
-*/
-   if (rt->rt_nhn == 1)
-   goto out;
-
for (label_index = 0; label_index < MAX_MP_SELECT_LABELS && !bos;
 label_index++) {
if (!pskb_may_pull(skb, sizeof(*hdr) * label_index))
@@ -165,9 +158,37 @@ static struct mpls_nh *mpls_select_multipath(struct 
mpls_route *rt,
}
}
 
-   nh_index = hash % rt->rt_nhn;
+   return hash;
+}
+
+static struct mpls_nh *mpls_select_multipath(struct mpls_route *rt,
+struct sk_buff *skb, bool bos)
+{
+   u32 hash = 0;
+   int nh_index;
+   int n = 0;
+
+   /* No need to look further into packet if there's only
+* one path
+*/
+   if (rt->rt_nhn == 1)
+   goto out;
+
+   if (rt->rt_nhn_alive <= 0)
+   return NULL;
+
+   hash = mpls_multipath_hash(rt, skb, bos);
+   nh_index = hash % rt->rt_nhn_alive;
+   for_nexthops(rt) {
+   if (nh->nh_flags & (RTNH_F_DEAD | RTNH_F_LINKDOWN))
+   continue;
+   if (n == nh_index)
+   return nh;
+   n++;
+   } endfor_nexthops(rt);
+
 out:
-   return &rt->rt_nh[nh_index];
+   return &rt->rt_nh[0];
 }
 
 static bool mpls_egress(struct mpls_route *rt, struct sk_buff *skb,
@@ -365,6 +386,7 @@ static struct mpls_route *mpls_rt_alloc(int num_nh, u8 
max_alen)
 GFP_KERNEL);
if (rt) {
rt->rt_nhn = num_nh;
+   rt->rt_nhn_alive = num_nh;
rt->rt_max_alen = max_alen_aligned;
}
 
@@ -536,6 +558,15 @@ static int mpls_nh_assign_dev(struct net *net, struct 
mpls_route *rt,
 
RCU_INIT_POINTER(nh->nh_dev, dev);
 
+   if (!netif_carrier_ok(dev))
+   nh->nh_flags |= RTNH_F_LINKDOWN;
+
+   if (!(dev->flags & IFF_UP))
+   nh->nh_flags |= RTNH_F_DEAD;
+
+   if (nh->nh_flags & (RTNH_F_DEAD | RTNH_F_LINKDOWN))
+   rt->rt_nhn_alive--;
+
return 0;
 
 errout:
@@ -577,7 +608,7 @@ errout:
 }
 
 static int mpls_nh_build(struct net *net, struct mpls_route *rt,
-struct mpls_nh *nh, int oif,
+struct mpls_nh *nh, int oif, int hops,
 struct nlattr *via, struct nlattr *newdst)
 {
int err = -ENOMEM;
@@ -666,7 +697,7 @@ static int mpls_nh_build_multi(struct mpls_route_config 
*cfg,
/* neither weighted multipath nor any flags
 * are supported
 */
-   if (rtnh->rtnh_hops || rtnh->rtnh_flags)
+   if (rtnh->rtnh_flags || rtnh->rtnh_flags)
goto errout;
 
attrlen = rtnh_attrlen(rtnh);
@@ -681,8 +712,8 @@ static int mpls_nh_build_multi(struct mpls_route_config 
*cfg,
goto errout;
 
err = mpls_nh_build(cfg->rc_nlinfo.nl_net, rt, nh,
-   rtnh->rtnh_ifindex, nla_via,
-   nla_newdst);
+   rtnh->rtnh_ifindex, rtnh->rtnh_hops,
+   nla_via, nla_newdst);
if (err)

[PATCH net-next v3] mpls: support for dead routes

2015-11-03 Thread Roopa Prabhu

From: Roopa Prabhu 

Adds support for RTNH_F_DEAD and RTNH_F_LINKDOWN flags on mpls
routes due to link events. Also adds code to ignore dead
routes during route selection

Signed-off-by: Roopa Prabhu 
---
Dave, I know you are only taking bug fixes currently. This patch
is borderline a bug fix because Eric thinks it is critical for
mpls multipath routes. I can sure resubmit it as a bug fix against net
when it is time if you did prefer that. Thanks!

RFC to v1:
Addressed a few comments from Eric and Robert:
- remove support for weighted nexthops
- Use rt_nhn_alive in the rt structure to keep count of alive
routes.
What i have not done is: sort nexthops on link events.
I am not comfortable recreating or sorting nexthops on
every carrier change. This leaves scope for optimizing in the future

v1 to v2:
Fix dead nexthop checks as suggested by dave

v2 to v3:
Fix duplicated argument reported by kbuild test robot



 net/mpls/af_mpls.c  | 189 
 net/mpls/internal.h |   3 +
 2 files changed, 165 insertions(+), 27 deletions(-)

diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index c70d750..8054904 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -96,22 +96,15 @@ bool mpls_pkt_too_big(const struct sk_buff *skb, unsigned 
int mtu)
 }
 EXPORT_SYMBOL_GPL(mpls_pkt_too_big);
 
-static struct mpls_nh *mpls_select_multipath(struct mpls_route *rt,
-struct sk_buff *skb, bool bos)
+static u32 mpls_multipath_hash(struct mpls_route *rt,
+  struct sk_buff *skb, bool bos)
 {
struct mpls_entry_decoded dec;
struct mpls_shim_hdr *hdr;
bool eli_seen = false;
int label_index;
-   int nh_index = 0;
u32 hash = 0;
 
-   /* No need to look further into packet if there's only
-* one path
-*/
-   if (rt->rt_nhn == 1)
-   goto out;
-
for (label_index = 0; label_index < MAX_MP_SELECT_LABELS && !bos;
 label_index++) {
if (!pskb_may_pull(skb, sizeof(*hdr) * label_index))
@@ -165,9 +158,37 @@ static struct mpls_nh *mpls_select_multipath(struct 
mpls_route *rt,
}
}
 
-   nh_index = hash % rt->rt_nhn;
+   return hash;
+}
+
+static struct mpls_nh *mpls_select_multipath(struct mpls_route *rt,
+struct sk_buff *skb, bool bos)
+{
+   u32 hash = 0;
+   int nh_index;
+   int n = 0;
+
+   /* No need to look further into packet if there's only
+* one path
+*/
+   if (rt->rt_nhn == 1)
+   goto out;
+
+   if (rt->rt_nhn_alive <= 0)
+   return NULL;
+
+   hash = mpls_multipath_hash(rt, skb, bos);
+   nh_index = hash % rt->rt_nhn_alive;
+   for_nexthops(rt) {
+   if (nh->nh_flags & (RTNH_F_DEAD | RTNH_F_LINKDOWN))
+   continue;
+   if (n == nh_index)
+   return nh;
+   n++;
+   } endfor_nexthops(rt);
+
 out:
-   return &rt->rt_nh[nh_index];
+   return &rt->rt_nh[0];
 }
 
 static bool mpls_egress(struct mpls_route *rt, struct sk_buff *skb,
@@ -365,6 +386,7 @@ static struct mpls_route *mpls_rt_alloc(int num_nh, u8 
max_alen)
 GFP_KERNEL);
if (rt) {
rt->rt_nhn = num_nh;
+   rt->rt_nhn_alive = num_nh;
rt->rt_max_alen = max_alen_aligned;
}
 
@@ -536,6 +558,15 @@ static int mpls_nh_assign_dev(struct net *net, struct 
mpls_route *rt,
 
RCU_INIT_POINTER(nh->nh_dev, dev);
 
+   if (!netif_carrier_ok(dev))
+   nh->nh_flags |= RTNH_F_LINKDOWN;
+
+   if (!(dev->flags & IFF_UP))
+   nh->nh_flags |= RTNH_F_DEAD;
+
+   if (nh->nh_flags & (RTNH_F_DEAD | RTNH_F_LINKDOWN))
+   rt->rt_nhn_alive--;
+
return 0;
 
 errout:
@@ -577,7 +608,7 @@ errout:
 }
 
 static int mpls_nh_build(struct net *net, struct mpls_route *rt,
-struct mpls_nh *nh, int oif,
+struct mpls_nh *nh, int oif, int hops,
 struct nlattr *via, struct nlattr *newdst)
 {
int err = -ENOMEM;
@@ -681,8 +712,8 @@ static int mpls_nh_build_multi(struct mpls_route_config 
*cfg,
goto errout;
 
err = mpls_nh_build(cfg->rc_nlinfo.nl_net, rt, nh,
-   rtnh->rtnh_ifindex, nla_via,
-   nla_newdst);
+   rtnh->rtnh_ifindex, rtnh->rtnh_hops,
+   nla_via, nla_newdst);
if (err)
goto errout;
 
@@ -875,34 +906,100 @@ free:
return ERR_PTR(err);

[PATCH net-next v4] mpls: support for dead routes

2015-11-20 Thread Roopa Prabhu

From: Roopa Prabhu 

Adds support for RTNH_F_DEAD and RTNH_F_LINKDOWN flags on mpls
routes due to link events. Also adds code to ignore dead
routes during route selection.

Unlike ip routes, mpls routes are not deleted when the route goes
dead. This is current mpls behaviour and this patch does not change
that. With this patch however, routes will be marked dead.
dead routes are not notified to userspace (this is consistent with ipv4
routes).

dead routes:
---
$ip -f mpls route show
100
nexthop as to 200 via inet 10.1.1.2  dev swp1
nexthop as to 700 via inet 10.1.1.6  dev swp2

$ip link set dev swp1 down

$ip link show dev swp1
4: swp1:  mtu 1500 qdisc pfifo_fast state DOWN mode
DEFAULT group default qlen 1000
link/ether 00:02:00:00:00:01 brd ff:ff:ff:ff:ff:ff

$ip -f mpls route show
100
nexthop as to 200 via inet 10.1.1.2  dev swp1 dead linkdown
nexthop as to 700 via inet 10.1.1.6  dev swp2

linkdown routes:

$ip -f mpls route show
100
nexthop as to 200 via inet 10.1.1.2  dev swp1
nexthop as to 700 via inet 10.1.1.6  dev swp2

$ip link show dev swp1
4: swp1:  mtu 1500 qdisc pfifo_fast
state UP mode DEFAULT group default qlen 1000
link/ether 00:02:00:00:00:01 brd ff:ff:ff:ff:ff:ff

/* carrier goes down */
$ip link show dev swp1
4: swp1:  mtu 1500 qdisc pfifo_fast
state DOWN mode DEFAULT group default qlen 1000
link/ether 00:02:00:00:00:01 brd ff:ff:ff:ff:ff:ff

$ip -f mpls route show
100
nexthop as to 200 via inet 10.1.1.2  dev swp1 linkdown
nexthop as to 700 via inet 10.1.1.6  dev swp2

Signed-off-by: Roopa Prabhu 
---
RFC to v1:
Addressed a few comments from Eric and Robert:
- remove support for weighted nexthops
- Use rt_nhn_alive in the rt structure to keep count of alive
routes.
What i have not done is: sort nexthops on link events.
I am not comfortable recreating or sorting nexthops on
every carrier change. This leaves scope for optimizing in the
future

v1 to v2:
Fix dead nexthop checks as suggested by dave

v2 to v3:
Fix duplicated argument reported by kbuild test robot

v3 - v4:
- removed per route rt_flags and derive it from the nh_flags during 
dumps
- use kmemdup to make a copy of the route during route updates
  due to link events

 net/mpls/af_mpls.c  | 248 
 net/mpls/internal.h |   2 +
 2 files changed, 213 insertions(+), 37 deletions(-)

diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index c70d750..c72c8e1 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -96,22 +96,15 @@ bool mpls_pkt_too_big(const struct sk_buff *skb, unsigned 
int mtu)
 }
 EXPORT_SYMBOL_GPL(mpls_pkt_too_big);
 
-static struct mpls_nh *mpls_select_multipath(struct mpls_route *rt,
-struct sk_buff *skb, bool bos)
+static u32 mpls_multipath_hash(struct mpls_route *rt,
+  struct sk_buff *skb, bool bos)
 {
struct mpls_entry_decoded dec;
struct mpls_shim_hdr *hdr;
bool eli_seen = false;
int label_index;
-   int nh_index = 0;
u32 hash = 0;
 
-   /* No need to look further into packet if there's only
-* one path
-*/
-   if (rt->rt_nhn == 1)
-   goto out;
-
for (label_index = 0; label_index < MAX_MP_SELECT_LABELS && !bos;
 label_index++) {
if (!pskb_may_pull(skb, sizeof(*hdr) * label_index))
@@ -165,7 +158,37 @@ static struct mpls_nh *mpls_select_multipath(struct 
mpls_route *rt,
}
}
 
-   nh_index = hash % rt->rt_nhn;
+   return hash;
+}
+
+static struct mpls_nh *mpls_select_multipath(struct mpls_route *rt,
+struct sk_buff *skb, bool bos)
+{
+   u32 hash = 0;
+   int nh_index = 0;
+   int n = 0;
+
+   /* No need to look further into packet if there's only
+* one path
+*/
+   if (rt->rt_nhn == 1)
+   goto out;
+
+   if (rt->rt_nhn_alive <= 0)
+   return NULL;
+
+   hash = mpls_multipath_hash(rt, skb, bos);
+   nh_index = hash % rt->rt_nhn_alive;
+   if (rt->rt_nhn_alive == rt->rt_nhn)
+   goto out;
+   for_nexthops(rt) {
+   if (nh->nh_flags & (RTNH_F_DEAD | RTNH_F_LINKDOWN))
+   continue;
+   if (n == nh_index)
+   return nh;
+   n++;
+   } endfor_nexthops(rt);
+
 out:
return &rt->rt_nh[nh_index];
 }
@@ -354,17 +377,24 @@ struct mpls_route_config {
int rc_mp_len;
 };
 
+static inline int mpls_route_alloc_size(int num_nh, u8 max_alen_aligned)
+{
+   struct mpls_route *rt;
+
+   return (ALIGN(sizeof(*rt) + num_nh * sizeof(*rt->rt_nh),
+ VIA_ALEN_ALIGN

[PATCH net-next v5] mpls: support for dead routes

2015-11-24 Thread Roopa Prabhu

From: Roopa Prabhu 

Adds support for RTNH_F_DEAD and RTNH_F_LINKDOWN flags on mpls
routes due to link events. Also adds code to ignore dead
routes during route selection.

Unlike ip routes, mpls routes are not deleted when the route goes
dead. This is current mpls behaviour and this patch does not change
that. With this patch however, routes will be marked dead.
dead routes are not notified to userspace (this is consistent with ipv4
routes).

dead routes:
---
$ip -f mpls route show
100
nexthop as to 200 via inet 10.1.1.2  dev swp1
nexthop as to 700 via inet 10.1.1.6  dev swp2

$ip link set dev swp1 down

$ip link show dev swp1
4: swp1:  mtu 1500 qdisc pfifo_fast state DOWN mode
DEFAULT group default qlen 1000
link/ether 00:02:00:00:00:01 brd ff:ff:ff:ff:ff:ff

$ip -f mpls route show
100
nexthop as to 200 via inet 10.1.1.2  dev swp1 dead linkdown
nexthop as to 700 via inet 10.1.1.6  dev swp2

linkdown routes:

$ip -f mpls route show
100
nexthop as to 200 via inet 10.1.1.2  dev swp1
nexthop as to 700 via inet 10.1.1.6  dev swp2

$ip link show dev swp1
4: swp1:  mtu 1500 qdisc pfifo_fast
state UP mode DEFAULT group default qlen 1000
link/ether 00:02:00:00:00:01 brd ff:ff:ff:ff:ff:ff

/* carrier goes down */
$ip link show dev swp1
4: swp1:  mtu 1500 qdisc pfifo_fast
state DOWN mode DEFAULT group default qlen 1000
link/ether 00:02:00:00:00:01 brd ff:ff:ff:ff:ff:ff

$ip -f mpls route show
100
nexthop as to 200 via inet 10.1.1.2  dev swp1 linkdown
nexthop as to 700 via inet 10.1.1.6  dev swp2

Signed-off-by: Roopa Prabhu 
---

RFC to v1:
Addressed a few comments from Eric and Robert:
- remove support for weighted nexthops
- Use rt_nhn_alive in the rt structure to keep count of alive
routes.
What i have not done is: sort nexthops on link events.
I am not comfortable recreating or sorting nexthops on
every carrier change. This leaves scope for optimizing in the
future

v1 to v2:
Fix dead nexthop checks as suggested by dave

v2 to v3:
Fix duplicated argument reported by kbuild test robot

v3 - v4:
- removed per route rt_flags and derive it from the nh_flags during 
dumps
- use kmemdup to make a copy of the route during route updates
  due to link events

v4 -v5
- if kmemdup fails, modify the original route in place. This is a
corner case and only side effect is that in the remote case
of kmemdup failure, the changes will not be atomically visible
to datapath.
- replace for_nexthops with change_nexthops in a bunch of places.
- fix indent


 net/mpls/af_mpls.c  | 250 
 net/mpls/internal.h |   2 +
 2 files changed, 215 insertions(+), 37 deletions(-)

diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index c70d750..2248015 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -96,22 +96,15 @@ bool mpls_pkt_too_big(const struct sk_buff *skb, unsigned 
int mtu)
 }
 EXPORT_SYMBOL_GPL(mpls_pkt_too_big);
 
-static struct mpls_nh *mpls_select_multipath(struct mpls_route *rt,
-struct sk_buff *skb, bool bos)
+static u32 mpls_multipath_hash(struct mpls_route *rt,
+  struct sk_buff *skb, bool bos)
 {
struct mpls_entry_decoded dec;
struct mpls_shim_hdr *hdr;
bool eli_seen = false;
int label_index;
-   int nh_index = 0;
u32 hash = 0;
 
-   /* No need to look further into packet if there's only
-* one path
-*/
-   if (rt->rt_nhn == 1)
-   goto out;
-
for (label_index = 0; label_index < MAX_MP_SELECT_LABELS && !bos;
 label_index++) {
if (!pskb_may_pull(skb, sizeof(*hdr) * label_index))
@@ -165,7 +158,37 @@ static struct mpls_nh *mpls_select_multipath(struct 
mpls_route *rt,
}
}
 
-   nh_index = hash % rt->rt_nhn;
+   return hash;
+}
+
+static struct mpls_nh *mpls_select_multipath(struct mpls_route *rt,
+struct sk_buff *skb, bool bos)
+{
+   u32 hash = 0;
+   int nh_index = 0;
+   int n = 0;
+
+   /* No need to look further into packet if there's only
+* one path
+*/
+   if (rt->rt_nhn == 1)
+   goto out;
+
+   if (rt->rt_nhn_alive <= 0)
+   return NULL;
+
+   hash = mpls_multipath_hash(rt, skb, bos);
+   nh_index = hash % rt->rt_nhn_alive;
+   if (rt->rt_nhn_alive == rt->rt_nhn)
+   goto out;
+   for_nexthops(rt) {
+   if (nh->nh_flags & (RTNH_F_DEAD | RTNH_F_LINKDOWN))
+   continue;
+   if (n == nh_index)
+   return nh;
+   n++;
+   } endfor_nexthops(rt);
+
 out:
return &

[PATCH net-next v6] mpls: support for dead routes

2015-11-28 Thread Roopa Prabhu

From: Roopa Prabhu 

Adds support for RTNH_F_DEAD and RTNH_F_LINKDOWN flags on mpls
routes due to link events. Also adds code to ignore dead
routes during route selection.

Unlike ip routes, mpls routes are not deleted when the route goes
dead. This is current mpls behaviour and this patch does not change
that. With this patch however, routes will be marked dead.
dead routes are not notified to userspace (this is consistent with ipv4
routes).

dead routes:
---
$ip -f mpls route show
100
nexthop as to 200 via inet 10.1.1.2  dev swp1
nexthop as to 700 via inet 10.1.1.6  dev swp2

$ip link set dev swp1 down

$ip link show dev swp1
4: swp1:  mtu 1500 qdisc pfifo_fast state DOWN mode
DEFAULT group default qlen 1000
link/ether 00:02:00:00:00:01 brd ff:ff:ff:ff:ff:ff

$ip -f mpls route show
100
nexthop as to 200 via inet 10.1.1.2  dev swp1 dead linkdown
nexthop as to 700 via inet 10.1.1.6  dev swp2

linkdown routes:

$ip -f mpls route show
100
nexthop as to 200 via inet 10.1.1.2  dev swp1
nexthop as to 700 via inet 10.1.1.6  dev swp2

$ip link show dev swp1
4: swp1:  mtu 1500 qdisc pfifo_fast
state UP mode DEFAULT group default qlen 1000
link/ether 00:02:00:00:00:01 brd ff:ff:ff:ff:ff:ff

/* carrier goes down */
$ip link show dev swp1
4: swp1:  mtu 1500 qdisc pfifo_fast
state DOWN mode DEFAULT group default qlen 1000
link/ether 00:02:00:00:00:01 brd ff:ff:ff:ff:ff:ff

$ip -f mpls route show
100
nexthop as to 200 via inet 10.1.1.2  dev swp1 linkdown
nexthop as to 700 via inet 10.1.1.6  dev swp2

Signed-off-by: Roopa Prabhu 
---

RFC to v1:
Addressed a few comments from Eric and Robert:
- remove support for weighted nexthops
- Use rt_nhn_alive in the rt structure to keep count of alive
routes.
What i have not done is: sort nexthops on link events.
I am not comfortable recreating or sorting nexthops on
every carrier change. This leaves scope for optimizing in the
future

v1 to v2:
Fix dead nexthop checks as suggested by dave

v2 to v3:
Fix duplicated argument reported by kbuild test robot

v3 - v4:
- removed per route rt_flags and derive it from the nh_flags during 
dumps
- use kmemdup to make a copy of the route during route updates
  due to link events

v4 -v5
- if kmemdup fails, modify the original route in place. This is a
corner case and only side effect is that in the remote case
of kmemdup failure, the changes will not be atomically visible
to datapath.
- replace for_nexthops with change_nexthops in a bunch of places.
- fix indent

v5 - v6
- update routes in place in mpls netdev notifier handlers. 
the additional kmemdup complexity and failure path recovery
does not seem necessary to support the transient atomic update
case


 net/mpls/af_mpls.c  | 184 
 net/mpls/internal.h |   2 +
 2 files changed, 158 insertions(+), 28 deletions(-)

diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index c70d750..ab01d9e 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -96,22 +96,15 @@ bool mpls_pkt_too_big(const struct sk_buff *skb, unsigned 
int mtu)
 }
 EXPORT_SYMBOL_GPL(mpls_pkt_too_big);
 
-static struct mpls_nh *mpls_select_multipath(struct mpls_route *rt,
-struct sk_buff *skb, bool bos)
+static u32 mpls_multipath_hash(struct mpls_route *rt,
+  struct sk_buff *skb, bool bos)
 {
struct mpls_entry_decoded dec;
struct mpls_shim_hdr *hdr;
bool eli_seen = false;
int label_index;
-   int nh_index = 0;
u32 hash = 0;
 
-   /* No need to look further into packet if there's only
-* one path
-*/
-   if (rt->rt_nhn == 1)
-   goto out;
-
for (label_index = 0; label_index < MAX_MP_SELECT_LABELS && !bos;
 label_index++) {
if (!pskb_may_pull(skb, sizeof(*hdr) * label_index))
@@ -165,7 +158,37 @@ static struct mpls_nh *mpls_select_multipath(struct 
mpls_route *rt,
}
}
 
-   nh_index = hash % rt->rt_nhn;
+   return hash;
+}
+
+static struct mpls_nh *mpls_select_multipath(struct mpls_route *rt,
+struct sk_buff *skb, bool bos)
+{
+   u32 hash = 0;
+   int nh_index = 0;
+   int n = 0;
+
+   /* No need to look further into packet if there's only
+* one path
+*/
+   if (rt->rt_nhn == 1)
+   goto out;
+
+   if (rt->rt_nhn_alive <= 0)
+   return NULL;
+
+   hash = mpls_multipath_hash(rt, skb, bos);
+   nh_index = hash % rt->rt_nhn_alive;
+   if (rt->rt_nhn_alive == rt->rt_nhn)
+   goto out;
+   for_nexthops(rt) {
+   if (

[PATCH net-next 1/3] mpls: move mpls_route nexthop fields to a new nhlfe struct

2015-08-11 Thread Roopa Prabhu

From: Roopa Prabhu 

moves mpls_route nexthop fields to a new mpls_nhlfe
struct. mpls_nhlfe represents a mpls nexthop label forwarding entry.
It prepares mpls route structure for multipath support.

In the process moves mpls_route structure into internal.h.
Moves some of the code from mpls_route_add into a separate mpls
nhlfe build function. changed mpls_rt_alloc to take number of
nexthops as argument.

A mpls route can point to multiple mpls_nhlfe. This patch
does not support multipath yet, hence the rest of the changes
assume that a mpls route points to a single mpls_nhlfe

Signed-off-by: Roopa Prabhu 
---
 net/mpls/af_mpls.c  |  225 ---
 net/mpls/internal.h |   35 
 2 files changed, 158 insertions(+), 102 deletions(-)

diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index 8c5707d..cf86e9d 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -21,35 +21,6 @@
 #endif
 #include "internal.h"
 
-#define LABEL_NOT_SPECIFIED (1<<20)
-#define MAX_NEW_LABELS 2
-
-/* This maximum ha length copied from the definition of struct neighbour */
-#define MAX_VIA_ALEN (ALIGN(MAX_ADDR_LEN, sizeof(unsigned long)))
-
-enum mpls_payload_type {
-   MPT_UNSPEC, /* IPv4 or IPv6 */
-   MPT_IPV4 = 4,
-   MPT_IPV6 = 6,
-
-   /* Other types not implemented:
-*  - Pseudo-wire with or without control word (RFC4385)
-*  - GAL (RFC5586)
-*/
-};
-
-struct mpls_route { /* next hop label forwarding entry */
-   struct net_device __rcu *rt_dev;
-   struct rcu_head rt_rcu;
-   u32 rt_label[MAX_NEW_LABELS];
-   u8  rt_protocol; /* routing protocol that set this 
entry */
-   u8  rt_payload_type;
-   u8  rt_labels;
-   u8  rt_via_alen;
-   u8  rt_via_table;
-   u8  rt_via[0];
-};
-
 static int zero = 0;
 static int label_limit = (1 << 20) - 1;
 
@@ -83,7 +54,7 @@ EXPORT_SYMBOL_GPL(mpls_output_possible);
 static unsigned int mpls_rt_header_size(const struct mpls_route *rt)
 {
/* The size of the layer 2.5 labels to be added for this route */
-   return rt->rt_labels * sizeof(struct mpls_shim_hdr);
+   return rt->rt_nh->nh_labels * sizeof(struct mpls_shim_hdr);
 }
 
 unsigned int mpls_dev_mtu(const struct net_device *dev)
@@ -124,7 +95,7 @@ static bool mpls_egress(struct mpls_route *rt, struct 
sk_buff *skb,
if (!pskb_may_pull(skb, 12))
return false;
 
-   payload_type = rt->rt_payload_type;
+   payload_type = rt->rt_nh->nh_payload_type;
if (payload_type == MPT_UNSPEC)
payload_type = ip_hdr(skb)->version;
 
@@ -197,7 +168,7 @@ static int mpls_forward(struct sk_buff *skb, struct 
net_device *dev,
goto drop;
 
/* Find the output device */
-   out_dev = rcu_dereference(rt->rt_dev);
+   out_dev = rcu_dereference(rt->rt_nh->nh_dev);
if (!mpls_output_possible(out_dev))
goto drop;
 
@@ -240,13 +211,15 @@ static int mpls_forward(struct sk_buff *skb, struct 
net_device *dev,
/* Push the new labels */
hdr = mpls_hdr(skb);
bos = dec.bos;
-   for (i = rt->rt_labels - 1; i >= 0; i--) {
-   hdr[i] = mpls_entry_encode(rt->rt_label[i], dec.ttl, 0, 
bos);
+   for (i = rt->rt_nh->nh_labels - 1; i >= 0; i--) {
+   hdr[i] = mpls_entry_encode(rt->rt_nh->nh_label[i],
+  dec.ttl, 0, bos);
bos = false;
}
}
 
-   err = neigh_xmit(rt->rt_via_table, out_dev, rt->rt_via, skb);
+   err = neigh_xmit(rt->rt_nh->nh_via_table, out_dev, rt->rt_nh->nh_via,
+skb);
if (err)
net_dbg_ratelimited("%s: packet transmission failed: %d\n",
__func__, err);
@@ -281,13 +254,15 @@ struct mpls_route_config {
struct nl_info  rc_nlinfo;
 };
 
-static struct mpls_route *mpls_rt_alloc(size_t alen)
+static struct mpls_route *mpls_rt_alloc(int num_nh)
 {
struct mpls_route *rt;
 
-   rt = kzalloc(sizeof(*rt) + alen, GFP_KERNEL);
+   rt = kzalloc(sizeof(*rt) + (num_nh * sizeof(struct mpls_nhlfe)),
+GFP_KERNEL);
if (rt)
-   rt->rt_via_alen = alen;
+   rt->rt_nhn = num_nh;
+
return rt;
 }
 
@@ -322,7 +297,7 @@ static void mpls_route_update(struct net *net, unsigned 
index,
 
platform_label = rtnl_dereference(net->mpls.platform_label);
rt = rtnl_dereference(platform_label[index]);
-   if (!dev || (rt && (rtnl_dereference(rt->rt_dev) == dev))) {
+   if (!

[PATCH net-next 0/3] mpls: multipath support

2015-08-11 Thread Roopa Prabhu

From: Roopa Prabhu 

This patch series adds multipath support to mpls routes.

resembles ipv4 multipath support. The multipath route nexthop
selection algorithm is the same code as in ipv4 fib code.

I understand that the multipath algorithm in ipv4 is undergoing
some changes and will move mpls to similar algo if applicable once
those get merged.

mpls multipath support can be moved under CONFIG_MPLS_ROUTE_MULTIPATH if
needed similar to CONFIG_IP_ROUTE_MULTIPATH. I started with that
but that resulted in too many #ifdef CONFIG_MPLS_ROUTE_MULTIPATH
throughout the af_mpls code. If there is a strong reason
to introduce a config option, I will respin v2 with
CONFIG_MPLS_ROUTE_MULTIPATH. These multipath patches do not introduce
any UAPI changes.

example iproute2 usage:
$ip -f mpls route add 100 nexthop as 200 via inet 10.1.1.2 dev swp1 \
nexthop as 300 via inet 10.1.1.6 dev swp2

$ip -f mpls route show
100 
nexthop as to 200 via inet 10.1.1.2  dev swp1
nexthop as to 300 via inet 10.1.1.6  dev swp2


Roopa Prabhu (3):
  mpls: move mpls_route nexthop fields to a new nhlfe struct
  mpls: consistently use u8 to store number of labels
  mpls: add multipath route support

 include/net/mpls_iptunnel.h |2 +-
 net/mpls/af_mpls.c  |  519 ---
 net/mpls/internal.h |   44 +++-
 3 files changed, 437 insertions(+), 128 deletions(-)

-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next 3/3] mpls: add multipath route support

2015-08-11 Thread Roopa Prabhu

From: Roopa Prabhu 

Adds support for MPLS multipath routes.
supports parse/fill of RTA_MULTIPATH netlink attribute
for multipath routes similar to ipv4 fib. Mostly based on
multipath handling in ipv4 fib code.

The multipath route nexthop selection algorithm is the same
code as in ipv4 fib.

This patch also adds new functions to parse multipath attributes
from route config into mpls_nhlfe.

note that it also simplifies mpls_route_update. Removes handling
route updates based on dev argument. The reason for
doing that is, the function was not being used for route updates
based on dev and if we do need to support route updates based
on dev in the future it will have to be done differently.

Signed-off-by: Roopa Prabhu 
---
 net/mpls/af_mpls.c  |  378 +--
 net/mpls/internal.h |   19 +++
 2 files changed, 323 insertions(+), 74 deletions(-)

diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index eb089ef..de5ae29 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -19,10 +19,12 @@
 #include 
 #include 
 #endif
+#include 
 #include "internal.h"
 
 static int zero = 0;
 static int label_limit = (1 << 20) - 1;
+static DEFINE_SPINLOCK(mpls_multipath_lock);
 
 static void rtmsg_lfib(int event, u32 label, struct mpls_route *rt,
   struct nlmsghdr *nlh, struct net *net, u32 portid,
@@ -51,10 +53,10 @@ bool mpls_output_possible(const struct net_device *dev)
 }
 EXPORT_SYMBOL_GPL(mpls_output_possible);
 
-static unsigned int mpls_rt_header_size(const struct mpls_route *rt)
+static unsigned int mpls_nhlfe_header_size(const struct mpls_nhlfe *nhlfe)
 {
/* The size of the layer 2.5 labels to be added for this route */
-   return rt->rt_nh->nh_labels * sizeof(struct mpls_shim_hdr);
+   return nhlfe->nh_labels * sizeof(struct mpls_shim_hdr);
 }
 
 unsigned int mpls_dev_mtu(const struct net_device *dev)
@@ -76,7 +78,52 @@ bool mpls_pkt_too_big(const struct sk_buff *skb, unsigned 
int mtu)
 }
 EXPORT_SYMBOL_GPL(mpls_pkt_too_big);
 
-static bool mpls_egress(struct mpls_route *rt, struct sk_buff *skb,
+/* This is a cut/copy/modify from fib_select_multipath */
+static void mpls_select_multipath(struct mpls_route *rt, int *nhidx)
+{
+   int w;
+
+   spin_lock_bh(&mpls_multipath_lock);
+   if (rt->rt_power <= 0) {
+   int power = 0;
+
+   change_nexthops(rt) {
+   power += nhlfe->nh_weight;
+   nhlfe->nh_power = nhlfe->nh_weight;
+   } endfor_nexthops(rt);
+   rt->rt_power = power;
+   if (power <= 0) {
+   spin_unlock_bh(&mpls_multipath_lock);
+   /* Race condition: route has just become dead. */
+   *nhidx = 0;
+   return;
+   }
+   }
+
+   /* w should be random number [0..rt->rt_power-1],
+* it is pretty bad approximation.
+*/
+   w = jiffies % rt->rt_power;
+
+   change_nexthops(rt) {
+   if (nhlfe->nh_power) {
+   w -= nhlfe->nh_power;
+   if (w <= 0) {
+   nhlfe->nh_power--;
+   rt->rt_power--;
+   *nhidx = nhsel;
+   spin_unlock_bh(&mpls_multipath_lock);
+   return;
+   }
+   }
+   } endfor_nexthops(rt);
+
+   /* Race condition: route has just become dead. */
+   *nhidx = 0;
+   spin_unlock_bh(&mpls_multipath_lock);
+}
+
+static bool mpls_egress(struct mpls_nhlfe *nhlfe, struct sk_buff *skb,
struct mpls_entry_decoded dec)
 {
enum mpls_payload_type payload_type;
@@ -95,7 +142,7 @@ static bool mpls_egress(struct mpls_route *rt, struct 
sk_buff *skb,
if (!pskb_may_pull(skb, 12))
return false;
 
-   payload_type = rt->rt_nh->nh_payload_type;
+   payload_type = nhlfe->nh_payload_type;
if (payload_type == MPT_UNSPEC)
payload_type = ip_hdr(skb)->version;
 
@@ -130,6 +177,7 @@ static int mpls_forward(struct sk_buff *skb, struct 
net_device *dev,
struct net *net = dev_net(dev);
struct mpls_shim_hdr *hdr;
struct mpls_route *rt;
+   struct mpls_nhlfe *nhlfe;
struct mpls_entry_decoded dec;
struct net_device *out_dev;
struct mpls_dev *mdev;
@@ -137,6 +185,7 @@ static int mpls_forward(struct sk_buff *skb, struct 
net_device *dev,
unsigned int new_header_size;
unsigned int mtu;
int err;
+   int nhidx;
 
/* Careful this entire function runs inside of an rcu critical section 
*/
 
@@ -167,9 +216,12 @@ static int mpls_forward(struct sk_buff *skb, struct 
net_device *dev,
if (!rt)

[PATCH net-next 2/3] mpls: consistently use u8 to store number of labels

2015-08-11 Thread Roopa Prabhu

From: Roopa Prabhu 

change all types representing number of labels to u8
to be consistent.

This also changes labels to u8 in the light weight
mpls_tunnel_encap structure. This is because the
light weight mpls iptunnel code shares some of the label
encoding functions like nla_get/put_labels with the af_mpls
code.

Signed-off-by: Roopa Prabhu 
---
 include/net/mpls_iptunnel.h |2 +-
 net/mpls/af_mpls.c  |   10 +-
 net/mpls/internal.h |2 +-
 3 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/include/net/mpls_iptunnel.h b/include/net/mpls_iptunnel.h
index 4757997..179253f 100644
--- a/include/net/mpls_iptunnel.h
+++ b/include/net/mpls_iptunnel.h
@@ -18,7 +18,7 @@
 
 struct mpls_iptunnel_encap {
u32 label[MAX_NEW_LABELS];
-   u32 labels;
+   u8  labels;
 };
 
 static inline struct mpls_iptunnel_encap *mpls_lwtunnel_encap(struct 
lwtunnel_state *lwtstate)
diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index cf86e9d..eb089ef 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -243,11 +243,11 @@ static const struct nla_policy rtm_mpls_policy[RTA_MAX+1] 
= {
 struct mpls_route_config {
u32 rc_protocol;
u32 rc_ifindex;
-   u16 rc_via_table;
-   u16 rc_via_alen;
+   u8  rc_via_table;
+   u8  rc_via_alen;
u8  rc_via[MAX_VIA_ALEN];
+   u8  rc_output_labels;
u32 rc_label;
-   u32 rc_output_labels;
u32 rc_output_label[MAX_NEW_LABELS];
u32 rc_nlflags;
enum mpls_payload_type  rc_payload_type;
@@ -751,7 +751,7 @@ int nla_put_labels(struct sk_buff *skb, int attrtype,
 EXPORT_SYMBOL_GPL(nla_put_labels);
 
 int nla_get_labels(const struct nlattr *nla,
-  u32 max_labels, u32 *labels, u32 label[])
+  u32 max_labels, u8 *labels, u32 label[])
 {
unsigned len = nla_len(nla);
unsigned nla_labels;
@@ -859,7 +859,7 @@ static int rtm_to_route_config(struct sk_buff *skb,  struct 
nlmsghdr *nlh,
break;
case RTA_DST:
{
-   u32 label_count;
+   u8 label_count;
if (nla_get_labels(nla, 1, &label_count,
   &cfg->rc_label))
goto errout;
diff --git a/net/mpls/internal.h b/net/mpls/internal.h
index f05e2e8..f5dafcaf 100644
--- a/net/mpls/internal.h
+++ b/net/mpls/internal.h
@@ -87,7 +87,7 @@ static inline struct mpls_entry_decoded 
mpls_entry_decode(struct mpls_shim_hdr *
 
 int nla_put_labels(struct sk_buff *skb, int attrtype,  u8 labels,
   const u32 label[]);
-int nla_get_labels(const struct nlattr *nla, u32 max_labels, u32 *labels,
+int nla_get_labels(const struct nlattr *nla, u32 max_labels, u8 *labels,
   u32 label[]);
 bool mpls_output_possible(const struct net_device *dev);
 unsigned int mpls_dev_mtu(const struct net_device *dev);
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next] rtnl_fdb_dump: catch errors from ndo_fdb_dump and ndo_dflt_fdb_dump

2015-09-23 Thread Roopa Prabhu

From: Wilson Kok 

current ndo_fdb_dump and ndo_dflt_fdb_dump always return the current
fdb index. They dont return errors. Which results in fdb dumps
continuing on errors.

In one such case where bridges and vxlan devices were involved,
bridge driver returned -EMSGSIZE on a bridge, but since it continued
on error, the next vxlan device fdb dump (which was smaller in size)
succeeded, leaving fdb idx at an inconsistent value. This
resulted in the bridge fdb entry getting skipped and vxlan
fdb entry getting dumped twice.

This patch changes ndo_fdb_dump() to return the status and pass the
idx by reference for update. The dump aborts if non-zero status is
returned.

Signed-off-by: Wilson Kok 
Signed-off-by: Roopa Prabhu 
---
 drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c |  6 +++---
 drivers/net/vxlan.c  | 12 ++--
 include/linux/netdevice.h|  4 ++--
 include/linux/rtnetlink.h|  2 +-
 include/net/switchdev.h  |  4 ++--
 net/bridge/br_fdb.c  | 21 +
 net/bridge/br_private.h  |  2 +-
 net/core/rtnetlink.c | 24 +++-
 net/switchdev/switchdev.c| 12 
 9 files changed, 51 insertions(+), 36 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c 
b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c
index d448145..546842a 100644
--- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c
+++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c
@@ -433,7 +433,7 @@ static int qlcnic_fdb_add(struct ndmsg *ndm, struct nlattr 
*tb[],
 
 static int qlcnic_fdb_dump(struct sk_buff *skb, struct netlink_callback *ncb,
struct net_device *netdev,
-   struct net_device *filter_dev, int idx)
+   struct net_device *filter_dev, int *idx)
 {
struct qlcnic_adapter *adapter = netdev_priv(netdev);
 
@@ -442,9 +442,9 @@ static int qlcnic_fdb_dump(struct sk_buff *skb, struct 
netlink_callback *ncb,
 
if ((adapter->flags & QLCNIC_ESWITCH_ENABLED) ||
qlcnic_sriov_check(adapter))
-   idx = ndo_dflt_fdb_dump(skb, ncb, netdev, filter_dev, idx);
+   return ndo_dflt_fdb_dump(skb, ncb, netdev, filter_dev, idx);
 
-   return idx;
+   return 0;
 }
 
 static void qlcnic_82xx_cancel_idc_work(struct qlcnic_adapter *adapter)
diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index bbac1d3..68c92c2 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -911,20 +911,20 @@ out:
 /* Dump forwarding table */
 static int vxlan_fdb_dump(struct sk_buff *skb, struct netlink_callback *cb,
  struct net_device *dev,
- struct net_device *filter_dev, int idx)
+ struct net_device *filter_dev, int *idx)
 {
struct vxlan_dev *vxlan = netdev_priv(dev);
unsigned int h;
+   int err = 0;
 
for (h = 0; h < FDB_HASH_SIZE; ++h) {
struct vxlan_fdb *f;
-   int err;
 
hlist_for_each_entry_rcu(f, &vxlan->fdb_head[h], hlist) {
struct vxlan_rdst *rd;
 
list_for_each_entry_rcu(rd, &f->remotes, list) {
-   if (idx < cb->args[0])
+   if (*idx < cb->args[0])
goto skip;
 
err = vxlan_fdb_info(skb, vxlan, f,
@@ -932,15 +932,15 @@ static int vxlan_fdb_dump(struct sk_buff *skb, struct 
netlink_callback *cb,
 cb->nlh->nlmsg_seq,
 RTM_NEWNEIGH,
 NLM_F_MULTI, rd);
-   if (err < 0)
+   if (err)
goto out;
 skip:
-   ++idx;
+   *idx += 1;
}
}
}
 out:
-   return idx;
+   return err;
 }
 
 /* Watch incoming packets to learn mapping between Ethernet address
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 88a0069..87fcacc 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -976,7 +976,7 @@ typedef u16 (*select_queue_fallback_t)(struct net_device 
*dev,
  * Deletes the FDB entry from dev coresponding to addr.
  * int (*ndo_fdb_dump)(struct sk_buff *skb, struct netlink_callback *cb,
  *struct net_device *dev, struct net_device *filter_dev,
- *int idx)
+ *int *idx)
  * Used to add FDB entries to dump requests. Implementers should add
  * entries to skb and update i

[PATCH net-next] bridge: allow adding of fdb entries pointing to the bridge device

2015-10-06 Thread Roopa Prabhu

From: Roopa Prabhu 

This patch enables adding of fdb entries pointing to the bridge device.
This can be used to propagate mac address of vlan interfaces
configured on top of the vlan filtering bridge.

Before:
$bridge fdb add 44:38:39:00:27:9f dev bridge
RTNETLINK answers: Invalid argument

After:
$bridge fdb add 44:38:39:00:27:9f dev bridge

Signed-off-by: Roopa Prabhu 
---
 net/bridge/br_fdb.c | 106 
 1 file changed, 83 insertions(+), 23 deletions(-)

diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c
index 7f7d551..5d0f6f9 100644
--- a/net/bridge/br_fdb.c
+++ b/net/bridge/br_fdb.c
@@ -608,13 +608,14 @@ void br_fdb_update(struct net_bridge *br, struct 
net_bridge_port *source,
}
 }
 
-static int fdb_to_nud(const struct net_bridge_fdb_entry *fdb)
+static int fdb_to_nud(const struct net_bridge *br,
+ const struct net_bridge_fdb_entry *fdb)
 {
if (fdb->is_local)
return NUD_PERMANENT;
else if (fdb->is_static)
return NUD_NOARP;
-   else if (has_expired(fdb->dst->br, fdb))
+   else if (has_expired(br, fdb))
return NUD_STALE;
else
return NUD_REACHABLE;
@@ -640,7 +641,7 @@ static int fdb_fill_info(struct sk_buff *skb, const struct 
net_bridge *br,
ndm->ndm_flags   = fdb->added_by_external_learn ? NTF_EXT_LEARNED : 0;
ndm->ndm_type= 0;
ndm->ndm_ifindex = fdb->dst ? fdb->dst->dev->ifindex : br->dev->ifindex;
-   ndm->ndm_state   = fdb_to_nud(fdb);
+   ndm->ndm_state   = fdb_to_nud(br, fdb);
 
if (nla_put(skb, NDA_LLADDR, ETH_ALEN, &fdb->addr))
goto nla_put_failure;
@@ -785,7 +786,7 @@ static int fdb_add_entry(struct net_bridge_port *source, 
const __u8 *addr,
}
}
 
-   if (fdb_to_nud(fdb) != state) {
+   if (fdb_to_nud(br, fdb) != state) {
if (state & NUD_PERMANENT) {
fdb->is_local = 1;
if (!fdb->is_static) {
@@ -848,6 +849,7 @@ int br_fdb_add(struct ndmsg *ndm, struct nlattr *tb[],
struct net_bridge_vlan_group *vg;
struct net_bridge_port *p;
struct net_bridge_vlan *v;
+   struct net_bridge *br = NULL;
int err = 0;
 
if (!(ndm->ndm_state & (NUD_PERMANENT|NUD_NOARP|NUD_REACHABLE))) {
@@ -860,14 +862,19 @@ int br_fdb_add(struct ndmsg *ndm, struct nlattr *tb[],
return -EINVAL;
}
 
-   p = br_port_get_rtnl(dev);
-   if (p == NULL) {
-   pr_info("bridge: RTM_NEWNEIGH %s not a bridge port\n",
-   dev->name);
-   return -EINVAL;
+   if (dev->priv_flags & IFF_EBRIDGE) {
+   br = netdev_priv(dev);
+   vg = br_vlan_group(br);
+   } else {
+   p = br_port_get_rtnl(dev);
+   if (!p) {
+   pr_info("bridge: RTM_NEWNEIGH %s not a bridge port\n",
+   dev->name);
+   return -EINVAL;
+   }
+   vg = nbp_vlan_group(p);
}
 
-   vg = nbp_vlan_group(p);
if (vid) {
v = br_vlan_find(vg, vid);
if (!v) {
@@ -877,9 +884,15 @@ int br_fdb_add(struct ndmsg *ndm, struct nlattr *tb[],
}
 
/* VID was specified, so use it. */
-   err = __br_fdb_add(ndm, p, addr, nlh_flags, vid);
+   if (dev->priv_flags & IFF_EBRIDGE)
+   err = br_fdb_insert(br, NULL, addr, vid);
+   else
+   err = __br_fdb_add(ndm, p, addr, nlh_flags, vid);
} else {
-   err = __br_fdb_add(ndm, p, addr, nlh_flags, 0);
+   if (dev->priv_flags & IFF_EBRIDGE)
+   err = br_fdb_insert(br, NULL, addr, 0);
+   else
+   err = __br_fdb_add(ndm, p, addr, nlh_flags, 0);
if (err || !vg || !vg->num_vlans)
goto out;
 
@@ -888,7 +901,11 @@ int br_fdb_add(struct ndmsg *ndm, struct nlattr *tb[],
 * vlan on this port.
 */
list_for_each_entry(v, &vg->vlan_list, vlist) {
-   err = __br_fdb_add(ndm, p, addr, nlh_flags, v->vid);
+   if (dev->priv_flags & IFF_EBRIDGE)
+   err = br_fdb_insert(br, NULL, addr, v->vid);
+   else
+   err = __br_fdb_add(ndm, p, addr, nlh_flags,
+  v->vid);
if (err)
goto out;
}
@@ -898,6 +915,32 @@ out:
return err;
 }
 
+static int fdb_delete_by_addr(struct net_bridge *

[PATCH net-next v2] bridge: allow adding of fdb entries pointing to the bridge device

2015-10-06 Thread Roopa Prabhu

From: Roopa Prabhu 

This patch enables adding of fdb entries pointing to the bridge device.
This can be used to propagate mac address of vlan interfaces
configured on top of the vlan filtering bridge.

Before:
$bridge fdb add 44:38:39:00:27:9f dev bridge
RTNETLINK answers: Invalid argument

After:
$bridge fdb add 44:38:39:00:27:9f dev bridge

Signed-off-by: Roopa Prabhu 
Reviewed-by: Nikolay Aleksandrov 
---
 net/bridge/br_fdb.c | 110 
 1 file changed, 85 insertions(+), 25 deletions(-)

diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c
index 7f7d551..2f8858a 100644
--- a/net/bridge/br_fdb.c
+++ b/net/bridge/br_fdb.c
@@ -608,13 +608,14 @@ void br_fdb_update(struct net_bridge *br, struct 
net_bridge_port *source,
}
 }
 
-static int fdb_to_nud(const struct net_bridge_fdb_entry *fdb)
+static int fdb_to_nud(const struct net_bridge *br,
+ const struct net_bridge_fdb_entry *fdb)
 {
if (fdb->is_local)
return NUD_PERMANENT;
else if (fdb->is_static)
return NUD_NOARP;
-   else if (has_expired(fdb->dst->br, fdb))
+   else if (has_expired(br, fdb))
return NUD_STALE;
else
return NUD_REACHABLE;
@@ -640,7 +641,7 @@ static int fdb_fill_info(struct sk_buff *skb, const struct 
net_bridge *br,
ndm->ndm_flags   = fdb->added_by_external_learn ? NTF_EXT_LEARNED : 0;
ndm->ndm_type= 0;
ndm->ndm_ifindex = fdb->dst ? fdb->dst->dev->ifindex : br->dev->ifindex;
-   ndm->ndm_state   = fdb_to_nud(fdb);
+   ndm->ndm_state   = fdb_to_nud(br, fdb);
 
if (nla_put(skb, NDA_LLADDR, ETH_ALEN, &fdb->addr))
goto nla_put_failure;
@@ -785,7 +786,7 @@ static int fdb_add_entry(struct net_bridge_port *source, 
const __u8 *addr,
}
}
 
-   if (fdb_to_nud(fdb) != state) {
+   if (fdb_to_nud(br, fdb) != state) {
if (state & NUD_PERMANENT) {
fdb->is_local = 1;
if (!fdb->is_static) {
@@ -846,8 +847,9 @@ int br_fdb_add(struct ndmsg *ndm, struct nlattr *tb[],
   const unsigned char *addr, u16 vid, u16 nlh_flags)
 {
struct net_bridge_vlan_group *vg;
-   struct net_bridge_port *p;
+   struct net_bridge_port *p = NULL;
struct net_bridge_vlan *v;
+   struct net_bridge *br = NULL;
int err = 0;
 
if (!(ndm->ndm_state & (NUD_PERMANENT|NUD_NOARP|NUD_REACHABLE))) {
@@ -860,14 +862,19 @@ int br_fdb_add(struct ndmsg *ndm, struct nlattr *tb[],
return -EINVAL;
}
 
-   p = br_port_get_rtnl(dev);
-   if (p == NULL) {
-   pr_info("bridge: RTM_NEWNEIGH %s not a bridge port\n",
-   dev->name);
-   return -EINVAL;
+   if (dev->priv_flags & IFF_EBRIDGE) {
+   br = netdev_priv(dev);
+   vg = br_vlan_group(br);
+   } else {
+   p = br_port_get_rtnl(dev);
+   if (!p) {
+   pr_info("bridge: RTM_NEWNEIGH %s not a bridge port\n",
+   dev->name);
+   return -EINVAL;
+   }
+   vg = nbp_vlan_group(p);
}
 
-   vg = nbp_vlan_group(p);
if (vid) {
v = br_vlan_find(vg, vid);
if (!v) {
@@ -877,9 +884,15 @@ int br_fdb_add(struct ndmsg *ndm, struct nlattr *tb[],
}
 
/* VID was specified, so use it. */
-   err = __br_fdb_add(ndm, p, addr, nlh_flags, vid);
+   if (dev->priv_flags & IFF_EBRIDGE)
+   err = br_fdb_insert(br, NULL, addr, vid);
+   else
+   err = __br_fdb_add(ndm, p, addr, nlh_flags, vid);
} else {
-   err = __br_fdb_add(ndm, p, addr, nlh_flags, 0);
+   if (dev->priv_flags & IFF_EBRIDGE)
+   err = br_fdb_insert(br, NULL, addr, 0);
+   else
+   err = __br_fdb_add(ndm, p, addr, nlh_flags, 0);
if (err || !vg || !vg->num_vlans)
goto out;
 
@@ -888,7 +901,11 @@ int br_fdb_add(struct ndmsg *ndm, struct nlattr *tb[],
 * vlan on this port.
 */
list_for_each_entry(v, &vg->vlan_list, vlist) {
-   err = __br_fdb_add(ndm, p, addr, nlh_flags, v->vid);
+   if (dev->priv_flags & IFF_EBRIDGE)
+   err = br_fdb_insert(br, NULL, addr, v->vid);
+   else
+   err = __br_fdb_add(ndm, p, addr, nlh_flags,
+  v->vid);
if (err)

[PATCH net-next v2 2/2] mpls: flow-based multipath selection

2015-10-06 Thread Roopa Prabhu

From: Robert Shearman 

Change the selection of a multipath route to use a flow-based
hash. This more suitable for traffic sensitive to reordering within a
flow (e.g. TCP, L2VPN) and whilst still allowing a good distribution
of traffic given enough flows.

Selection of the path for a multipath route is done using a hash of:
1. Label stack up to MAX_MP_SELECT_LABELS labels or up to and
   including entropy label, whichever is first.
2. 3-tuple of (L3 src, L3 dst, proto) from IPv4/IPv6 header in MPLS
   payload, if present.

Naturally, a 5-tuple hash using L4 information in addition would be
possible and be better in some scenarios, but there is a tradeoff
between looking deeper into the packet to achieve good distribution,
and packet forwarding performance, and I have erred on the side of the
latter as the default.

Signed-off-by: Robert Shearman 
---
 net/mpls/af_mpls.c | 110 -
 1 file changed, 76 insertions(+), 34 deletions(-)

diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index ae9e153..1bef057 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -22,9 +22,13 @@
 #include 
 #include "internal.h"
 
+/* Maximum number of labels to look ahead at when selecting a path of
+ * a multipath route
+ */
+#define MAX_MP_SELECT_LABELS 4
+
 static int zero = 0;
 static int label_limit = (1 << 20) - 1;
-static DEFINE_SPINLOCK(mpls_multipath_lock);
 
 static void rtmsg_lfib(int event, u32 label, struct mpls_route *rt,
   struct nlmsghdr *nlh, struct net *net, u32 portid,
@@ -78,53 +82,91 @@ bool mpls_pkt_too_big(const struct sk_buff *skb, unsigned 
int mtu)
 }
 EXPORT_SYMBOL_GPL(mpls_pkt_too_big);
 
-/* This is a cut/copy/modify from fib_select_multipath */
-static struct mpls_nh *mpls_select_multipath(struct mpls_route *rt)
+static struct mpls_nh *mpls_select_multipath(struct mpls_route *rt,
+struct sk_buff *skb, bool bos)
 {
+   struct mpls_entry_decoded dec;
+   struct mpls_shim_hdr *hdr;
struct mpls_nh *nh;
struct mpls_nh *ret_nh;
-   int nhsel = 0;
-   int w;
-
-   spin_lock_bh(&mpls_multipath_lock);
+   bool eli_seen = false;
+   int label_index;
+   int nh_index;
+   u32 hash = 0;
+   int nhsel;
+
+   /* No need to look further into packet if there's only
+* one path
+*/
ret_nh = list_first_entry_or_null(&rt->rt_nhs, struct mpls_nh,
  nh_next);
-   if (rt->rt_power <= 0) {
-   int power = 0;
+   if (rt->rt_nhn == 1)
+   goto out;
 
-   list_for_each_entry(nh, &rt->rt_nhs, nh_next) {
-   power += nh->nh_weight;
-   nh->nh_power = nh->nh_weight;
+   for (label_index = 0; label_index < MAX_MP_SELECT_LABELS && !bos;
+label_index++) {
+   if (!pskb_may_pull(skb, sizeof(*hdr) * label_index))
+   break;
+
+   /* Read and decode the current label */
+   hdr = mpls_hdr(skb) + label_index;
+   dec = mpls_entry_decode(hdr);
+
+   /* RFC6790 - reserved labels MUST NOT be used as keys
+* for the load-balancing function
+*/
+   if (dec.label == MPLS_LABEL_ENTROPY) {
+   eli_seen = true;
+   } else if (dec.label >= MPLS_LABEL_FIRST_UNRESERVED) {
+   hash = jhash_1word(dec.label, hash);
+
+   /* The entropy label follows the entropy label
+* indicator, so this means that the entropy
+* label was just added to the hash - no need to
+* go any deeper either in the label stack or in the
+* payload
+*/
+   if (eli_seen)
+   break;
}
-   rt->rt_power = power;
-   if (power <= 0) {
-   spin_unlock_bh(&mpls_multipath_lock);
-   /* Race condition: route has just become dead. */
-   return ret_nh;
+
+   bos = dec.bos;
+   if (bos && pskb_may_pull(skb, sizeof(*hdr) * label_index +
+sizeof(struct iphdr))) {
+   const struct iphdr *v4hdr;
+
+   v4hdr = (const struct iphdr *)(mpls_hdr(skb) +
+  label_index);
+   if (v4hdr->version == 4) {
+   hash = jhash_3words(ntohl(v4hdr->saddr),
+   ntohl(v4hdr->daddr),
+   v4hdr->protocol, hash);
+   } else if (v4hdr->version == 6 &&
+   pskb_may_pull(skb, siz

[PATCH net-next v2 0/2] mpls: multipath support

2015-10-06 Thread Roopa Prabhu

From: Roopa Prabhu 

This patch adds support for MPLS multipath routes.

Includes following changes to support multipath:
- splits struct mpls_route into 'struct mpls_route + struct mpls_nh'.

- struct mpls_nh represents a mpls nexthop label forwarding entry

- Adds support to parse/fill RTA_MULTIPATH netlink attribute for
multipath routes similar to ipv4/v6 fib

- In this series the multipath route nexthop selection
algorithm starts with a simple round robin and is replaced by a hash
based algorithm from Robert Shearman in the last patch

- In the process of restructuring, this patch also consistently changes all
labels to u8

$ip -f mpls route add 100 nexthop as 200 via inet 10.1.1.2 dev swp1 \
nexthop as 700 via inet 10.1.1.6 dev swp2 \
nexthop as 800 via inet 40.1.1.2 dev swp3

$ip  -f mpls route show
100 
nexthop as to 200 via inet 10.1.1.2  dev swp1
nexthop as to 700 via inet 10.1.1.6  dev swp2
nexthop as to 800 via inet 40.1.1.2  dev swp3

Roopa Prabhu (2):
  mpls: multipath support
  mpls: flow-based multipath selection

Signed-off-by: Roopa Prabhu 


Changes since v1:
- Incorporate some feedback from Robert:
use dynamic allocation (list) instead of static allocation
for nexthops
- include flow hash based multipath selection from Robert


 include/net/mpls_iptunnel.h |   2 +-
 net/mpls/af_mpls.c  | 668 ++--
 net/mpls/internal.h |  57 +++-
 3 files changed, 572 insertions(+), 155 deletions(-)

-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next v2 1/2] mpls: multipath support

2015-10-06 Thread Roopa Prabhu

From: Roopa Prabhu 

This patch adds support for MPLS multipath routes.

Includes following changes to support multipath:
- splits struct mpls_route into 'struct mpls_route + struct mpls_nh'

- 'struct mpls_nh' represents a mpls nexthop label forwarding entry

- moves mpls route and nexthop structures into internal.h

- A mpls_route can point to multiple mpls_nh structs

- the nexthops are maintained as a list

- In the process of restructuring, this patch also consistently changes all
labels to u8

- Adds support to parse/fill RTA_MULTIPATH netlink attribute for
multipath routes similar to ipv4/v6 fib

- In this patch, the multipath route nexthop selection algorithm
is a simple round robin picked up from ipv4 fib code and is replaced by
a hash based algorithm from Robert Shearman in the next patch

- mpls_route_update cleanup: remove 'dev' handling in mpls_route_update.
mpls_route_update though implemented to update based on dev, it was never
used that way. And the dev handling gets tricky with multiple nexthops. Cannot
match against any single nexthops dev. So, this patch removes the unused
'dev' handling in mpls_route_update.

Example:

$ip -f mpls route add 100 nexthop as 200 via inet 10.1.1.2 dev swp1 \
nexthop as 700 via inet 10.1.1.6 dev swp2 \
nexthop as 800 via inet 40.1.1.2 dev swp3

$ip  -f mpls route show
100
nexthop as to 200 via inet 10.1.1.2  dev swp1
nexthop as to 700 via inet 10.1.1.6  dev swp2
nexthop as to 800 via inet 40.1.1.2  dev swp3

Signed-off-by: Roopa Prabhu 
---
 include/net/mpls_iptunnel.h |   2 +-
 net/mpls/af_mpls.c  | 627 +---
 net/mpls/internal.h |  43 ++-
 3 files changed, 516 insertions(+), 156 deletions(-)

diff --git a/include/net/mpls_iptunnel.h b/include/net/mpls_iptunnel.h
index 4757997..179253f 100644
--- a/include/net/mpls_iptunnel.h
+++ b/include/net/mpls_iptunnel.h
@@ -18,7 +18,7 @@
 
 struct mpls_iptunnel_encap {
u32 label[MAX_NEW_LABELS];
-   u32 labels;
+   u8  labels;
 };
 
 static inline struct mpls_iptunnel_encap *mpls_lwtunnel_encap(struct 
lwtunnel_state *lwtstate)
diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index 8c5707d..ae9e153 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -19,39 +19,12 @@
 #include 
 #include 
 #endif
+#include 
 #include "internal.h"
 
-#define LABEL_NOT_SPECIFIED (1<<20)
-#define MAX_NEW_LABELS 2
-
-/* This maximum ha length copied from the definition of struct neighbour */
-#define MAX_VIA_ALEN (ALIGN(MAX_ADDR_LEN, sizeof(unsigned long)))
-
-enum mpls_payload_type {
-   MPT_UNSPEC, /* IPv4 or IPv6 */
-   MPT_IPV4 = 4,
-   MPT_IPV6 = 6,
-
-   /* Other types not implemented:
-*  - Pseudo-wire with or without control word (RFC4385)
-*  - GAL (RFC5586)
-*/
-};
-
-struct mpls_route { /* next hop label forwarding entry */
-   struct net_device __rcu *rt_dev;
-   struct rcu_head rt_rcu;
-   u32 rt_label[MAX_NEW_LABELS];
-   u8  rt_protocol; /* routing protocol that set this 
entry */
-   u8  rt_payload_type;
-   u8  rt_labels;
-   u8  rt_via_alen;
-   u8  rt_via_table;
-   u8  rt_via[0];
-};
-
 static int zero = 0;
 static int label_limit = (1 << 20) - 1;
+static DEFINE_SPINLOCK(mpls_multipath_lock);
 
 static void rtmsg_lfib(int event, u32 label, struct mpls_route *rt,
   struct nlmsghdr *nlh, struct net *net, u32 portid,
@@ -80,10 +53,10 @@ bool mpls_output_possible(const struct net_device *dev)
 }
 EXPORT_SYMBOL_GPL(mpls_output_possible);
 
-static unsigned int mpls_rt_header_size(const struct mpls_route *rt)
+static unsigned int mpls_nh_header_size(const struct mpls_nh *nh)
 {
/* The size of the layer 2.5 labels to be added for this route */
-   return rt->rt_labels * sizeof(struct mpls_shim_hdr);
+   return nh->nh_labels * sizeof(struct mpls_shim_hdr);
 }
 
 unsigned int mpls_dev_mtu(const struct net_device *dev)
@@ -105,8 +78,58 @@ bool mpls_pkt_too_big(const struct sk_buff *skb, unsigned 
int mtu)
 }
 EXPORT_SYMBOL_GPL(mpls_pkt_too_big);
 
-static bool mpls_egress(struct mpls_route *rt, struct sk_buff *skb,
-   struct mpls_entry_decoded dec)
+/* This is a cut/copy/modify from fib_select_multipath */
+static struct mpls_nh *mpls_select_multipath(struct mpls_route *rt)
+{
+   struct mpls_nh *nh;
+   struct mpls_nh *ret_nh;
+   int nhsel = 0;
+   int w;
+
+   spin_lock_bh(&mpls_multipath_lock);
+   ret_nh = list_first_entry_or_null(&rt->rt_nhs, struct mpls_nh,
+ nh_next);
+   if (rt->rt_power <= 0) {
+   int power = 0;
+
+

[PATCH net-next] iproute2: document -timestamp option

2015-10-06 Thread Roopa Prabhu

From: Satish Ashok 

This patch documents bridge and ip -timestamp option

Signed-off-by: Satish Ashok 
---
 man/man8/bridge.8 | 3 +++
 man/man8/ip.8 | 4 
 2 files changed, 7 insertions(+)

diff --git a/man/man8/bridge.8 b/man/man8/bridge.8
index 5347a56..9f1051c 100644
--- a/man/man8/bridge.8
+++ b/man/man8/bridge.8
@@ -137,6 +137,9 @@ to
 .RI "-n[etns] " NETNS " [ " OPTIONS " ] " OBJECT " { " COMMAND " | "
 .BR help " }"
 
+.TP
+.BR "\-t" , " \-timestamp"
+display current time when using monitor option.
 
 .SH BRIDGE - COMMAND SYNTAX
 
diff --git a/man/man8/ip.8 b/man/man8/ip.8
index e6c2b32..1bdee11 100644
--- a/man/man8/ip.8
+++ b/man/man8/ip.8
@@ -175,6 +175,10 @@ executes specified command over all objects, it depends if 
command supports this
 .BR "\-c" , " -color"
 Use color output.
 
+.TP
+.BR "\-t" , " \-timestamp"
+display current time when using monitor option.
+
 .SH IP - COMMAND SYNTAX
 
 .SS
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH iproute2] bridge: add batch command support

2015-10-07 Thread Roopa Prabhu

From: Wilson Kok 

This patch adds support to batch bridge commands.
Follows ip batch code.

Signed-off-by: Wilson Kok 
Signed-off-by: Roopa Prabhu 
---
 bridge/bridge.c   | 59 +++
 man/man8/bridge.8 | 11 +++
 2 files changed, 70 insertions(+)

diff --git a/bridge/bridge.c b/bridge/bridge.c
index eaf09c8..c028f6c 100644
--- a/bridge/bridge.c
+++ b/bridge/bridge.c
@@ -9,6 +9,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "SNAPSHOT.h"
 #include "utils.h"
@@ -23,6 +24,8 @@ int show_stats;
 int show_details;
 int compress_vlans;
 int timestamp;
+char *batch_file;
+int force;
 const char *_SL_;
 
 static void usage(void) __attribute__((noreturn));
@@ -31,6 +34,7 @@ static void usage(void)
 {
fprintf(stderr,
 "Usage: bridge [ OPTIONS ] OBJECT { COMMAND | help }\n"
+"  bridge [ -force ] -batch filename\n"
 "where OBJECT := { link | fdb | mdb | vlan | monitor }\n"
 "  OPTIONS := { -V[ersion] | -s[tatistics] | -d[etails] |\n"
 "   -o[neline] | -t[imestamp] | -n[etns] name |\n"
@@ -71,6 +75,50 @@ static int do_cmd(const char *argv0, int argc, char **argv)
return -1;
 }
 
+static int batch(const char *name)
+{
+   char *line = NULL;
+   size_t len = 0;
+   int ret = EXIT_SUCCESS;
+
+   if (name && strcmp(name, "-") != 0) {
+   if (freopen(name, "r", stdin) == NULL) {
+   fprintf(stderr,
+   "Cannot open file \"%s\" for reading: %s\n",
+   name, strerror(errno));
+   return EXIT_FAILURE;
+   }
+   }
+
+   if (rtnl_open(&rth, 0) < 0) {
+   fprintf(stderr, "Cannot open rtnetlink\n");
+   return EXIT_FAILURE;
+   }
+
+   cmdlineno = 0;
+   while (getcmdline(&line, &len, stdin) != -1) {
+   char *largv[100];
+   int largc;
+
+   largc = makeargs(line, largv, 100);
+   if (largc == 0)
+   continue;   /* blank line */
+
+   if (do_cmd(largv[0], largc, largv)) {
+   fprintf(stderr, "Command failed %s:%d\n",
+   name, cmdlineno);
+   ret = EXIT_FAILURE;
+   if (!force)
+   break;
+   }
+   }
+   if (line)
+   free(line);
+
+   rtnl_close(&rth);
+   return ret;
+}
+
 int
 main(int argc, char **argv)
 {
@@ -123,6 +171,14 @@ main(int argc, char **argv)
exit(-1);
} else if (matches(opt, "-compressvlans") == 0) {
++compress_vlans;
+   } else if (matches(opt, "-force") == 0) {
+   ++force;
+   } else if (matches(opt, "-batch") == 0) {
+   argc--;
+   argv++;
+   if (argc <= 1)
+   usage();
+   batch_file = argv[1];
} else {
fprintf(stderr,
"Option \"%s\" is unknown, try \"bridge 
help\".\n",
@@ -134,6 +190,9 @@ main(int argc, char **argv)
 
_SL_ = oneline ? "\\" : "\n";
 
+   if (batch_file)
+   return batch(batch_file);
+
if (rtnl_open(&rth, 0) < 0)
exit(1);
 
diff --git a/man/man8/bridge.8 b/man/man8/bridge.8
index 5347a56..d45c728 100644
--- a/man/man8/bridge.8
+++ b/man/man8/bridge.8
@@ -21,6 +21,7 @@ bridge \- show / manipulate bridge addresses and devices
 \fB\-V\fR[\fIersion\fR] |
 \fB\-s\fR[\fItatistics\fR] |
 \fB\-n\fR[\fIetns\fR] name }
+\fB\-b\fR[\fIatch\fR] filename }
 
 .ti -8
 .BR "bridge link set"
@@ -137,6 +138,16 @@ to
 .RI "-n[etns] " NETNS " [ " OPTIONS " ] " OBJECT " { " COMMAND " | "
 .BR help " }"
 
+.TP
+.BR "\-b", " \-batch " 
+Read commands from provided file or standard input and invoke them.
+First failure will cause termination of bridge command.
+
+.TP
+.BR "\-force"
+Don't terminate bridge command on errors in batch mode.
+If there were any errors during execution of the commands, the application
+return code will be non zero.
 
 .SH BRIDGE - COMMAND SYNTAX
 
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next] ipv6 route: use err pointers instead of returning pointer by reference

2015-10-08 Thread Roopa Prabhu

From: Roopa Prabhu 

This patch makes ip6_route_info_create return err pointer instead of
returning the rt pointer by reference as suggested  by Dave

Signed-off-by: Roopa Prabhu 
---
Dave, sorry abt the delay on this one. net-next was closed when i got to it
and its been in my queue since then.

 net/ipv6/route.c | 30 --
 1 file changed, 16 insertions(+), 14 deletions(-)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 4320ddc..b45aa49 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1724,21 +1724,21 @@ static int ip6_convert_metrics(struct mx6_config *mxc,
return -EINVAL;
 }
 
-int ip6_route_info_create(struct fib6_config *cfg, struct rt6_info **rt_ret)
+static struct rt6_info *ip6_route_info_create(struct fib6_config *cfg)
 {
-   int err;
struct net *net = cfg->fc_nlinfo.nl_net;
struct rt6_info *rt = NULL;
struct net_device *dev = NULL;
struct inet6_dev *idev = NULL;
struct fib6_table *table;
int addr_type;
+   int err = -EINVAL;
 
if (cfg->fc_dst_len > 128 || cfg->fc_src_len > 128)
-   return -EINVAL;
+   goto out;
 #ifndef CONFIG_IPV6_SUBTREES
if (cfg->fc_src_len)
-   return -EINVAL;
+   goto out;
 #endif
if (cfg->fc_ifindex) {
err = -ENODEV;
@@ -1958,9 +1958,7 @@ install_route:
 
cfg->fc_nlinfo.nl_net = dev_net(dev);
 
-   *rt_ret = rt;
-
-   return 0;
+   return rt;
 out:
if (dev)
dev_put(dev);
@@ -1969,9 +1967,7 @@ out:
if (rt)
dst_free(&rt->dst);
 
-   *rt_ret = NULL;
-
-   return err;
+   return ERR_PTR(err);
 }
 
 int ip6_route_add(struct fib6_config *cfg)
@@ -1980,9 +1976,12 @@ int ip6_route_add(struct fib6_config *cfg)
struct rt6_info *rt = NULL;
int err;
 
-   err = ip6_route_info_create(cfg, &rt);
-   if (err)
+   rt = ip6_route_info_create(cfg);
+   if (IS_ERR(rt)) {
+   err = PTR_ERR(rt);
+   rt = NULL;
goto out;
+   }
 
err = ip6_convert_metrics(&mxc, cfg);
if (err)
@@ -2871,9 +2870,12 @@ static int ip6_route_multipath_add(struct fib6_config 
*cfg)
r_cfg.fc_encap_type = nla_get_u16(nla);
}
 
-   err = ip6_route_info_create(&r_cfg, &rt);
-   if (err)
+   rt = ip6_route_info_create(&r_cfg);
+   if (IS_ERR(rt)) {
+   err = PTR_ERR(rt);
+   rt = NULL;
goto cleanup;
+   }
 
err = ip6_route_info_append(&rt6_nh_list, rt, &r_cfg);
if (err) {
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next v3] bridge: allow adding of fdb entries pointing to the bridge device

2015-10-08 Thread Roopa Prabhu

From: Roopa Prabhu 

This patch enables adding of fdb entries pointing to the bridge device.
This can be used to propagate mac address of vlan interfaces
configured on top of the vlan filtering bridge.

Before:
$bridge fdb add 44:38:39:00:27:9f dev bridge
RTNETLINK answers: Invalid argument

After:
$bridge fdb add 44:38:39:00:27:9f dev bridge

Signed-off-by: Roopa Prabhu 
Reviewed-by: Nikolay Aleksandrov 
---
v1 - v2 : fix kbuild warnings
v2 - v3 : address review comments from Nikolay (use of br_vlan_should_use)

 net/bridge/br_fdb.c  | 122 ++-
 net/bridge/br_vlan.c |   1 +
 2 files changed, 93 insertions(+), 30 deletions(-)

diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c
index 7f7d551..f43ce05 100644
--- a/net/bridge/br_fdb.c
+++ b/net/bridge/br_fdb.c
@@ -608,13 +608,14 @@ void br_fdb_update(struct net_bridge *br, struct 
net_bridge_port *source,
}
 }
 
-static int fdb_to_nud(const struct net_bridge_fdb_entry *fdb)
+static int fdb_to_nud(const struct net_bridge *br,
+ const struct net_bridge_fdb_entry *fdb)
 {
if (fdb->is_local)
return NUD_PERMANENT;
else if (fdb->is_static)
return NUD_NOARP;
-   else if (has_expired(fdb->dst->br, fdb))
+   else if (has_expired(br, fdb))
return NUD_STALE;
else
return NUD_REACHABLE;
@@ -640,7 +641,7 @@ static int fdb_fill_info(struct sk_buff *skb, const struct 
net_bridge *br,
ndm->ndm_flags   = fdb->added_by_external_learn ? NTF_EXT_LEARNED : 0;
ndm->ndm_type= 0;
ndm->ndm_ifindex = fdb->dst ? fdb->dst->dev->ifindex : br->dev->ifindex;
-   ndm->ndm_state   = fdb_to_nud(fdb);
+   ndm->ndm_state   = fdb_to_nud(br, fdb);
 
if (nla_put(skb, NDA_LLADDR, ETH_ALEN, &fdb->addr))
goto nla_put_failure;
@@ -785,7 +786,7 @@ static int fdb_add_entry(struct net_bridge_port *source, 
const __u8 *addr,
}
}
 
-   if (fdb_to_nud(fdb) != state) {
+   if (fdb_to_nud(br, fdb) != state) {
if (state & NUD_PERMANENT) {
fdb->is_local = 1;
if (!fdb->is_static) {
@@ -846,8 +847,9 @@ int br_fdb_add(struct ndmsg *ndm, struct nlattr *tb[],
   const unsigned char *addr, u16 vid, u16 nlh_flags)
 {
struct net_bridge_vlan_group *vg;
-   struct net_bridge_port *p;
+   struct net_bridge_port *p = NULL;
struct net_bridge_vlan *v;
+   struct net_bridge *br = NULL;
int err = 0;
 
if (!(ndm->ndm_state & (NUD_PERMANENT|NUD_NOARP|NUD_REACHABLE))) {
@@ -860,26 +862,36 @@ int br_fdb_add(struct ndmsg *ndm, struct nlattr *tb[],
return -EINVAL;
}
 
-   p = br_port_get_rtnl(dev);
-   if (p == NULL) {
-   pr_info("bridge: RTM_NEWNEIGH %s not a bridge port\n",
-   dev->name);
-   return -EINVAL;
+   if (dev->priv_flags & IFF_EBRIDGE) {
+   br = netdev_priv(dev);
+   vg = br_vlan_group(br);
+   } else {
+   p = br_port_get_rtnl(dev);
+   if (!p) {
+   pr_info("bridge: RTM_NEWNEIGH %s not a bridge port\n",
+   dev->name);
+   return -EINVAL;
+   }
+   vg = nbp_vlan_group(p);
}
 
-   vg = nbp_vlan_group(p);
if (vid) {
v = br_vlan_find(vg, vid);
-   if (!v) {
-   pr_info("bridge: RTM_NEWNEIGH with unconfigured "
-   "vlan %d on port %s\n", vid, dev->name);
+   if (!v || !br_vlan_should_use(v)) {
+   pr_info("bridge: RTM_NEWNEIGH with unconfigured vlan %d 
on %s\n", vid, dev->name);
return -EINVAL;
}
 
/* VID was specified, so use it. */
-   err = __br_fdb_add(ndm, p, addr, nlh_flags, vid);
+   if (dev->priv_flags & IFF_EBRIDGE)
+   err = br_fdb_insert(br, NULL, addr, vid);
+   else
+   err = __br_fdb_add(ndm, p, addr, nlh_flags, vid);
} else {
-   err = __br_fdb_add(ndm, p, addr, nlh_flags, 0);
+   if (dev->priv_flags & IFF_EBRIDGE)
+   err = br_fdb_insert(br, NULL, addr, 0);
+   else
+   err = __br_fdb_add(ndm, p, addr, nlh_flags, 0);
if (err || !vg || !vg->num_vlans)
goto out;
 
@@ -888,7 +900,13 @@ int br_fdb_add(struct ndmsg *ndm, struct nlattr *tb[],
 * vlan on this port.
 */
list_for_each_entry(v, &vg->vlan_list, vlist) {
-

[PATCH net-next v2] ipv6 route: use err pointers instead of returning pointer by reference

2015-10-10 Thread Roopa Prabhu

From: Roopa Prabhu 

This patch makes ip6_route_info_create return err pointer instead of
returning the rt pointer by reference as suggested  by Dave

Signed-off-by: Roopa Prabhu 
---
v1 - v2: remove unnecessary NULL initialization of rt as pointed out by scott 
feldman

 net/ipv6/route.c | 32 +---
 1 file changed, 17 insertions(+), 15 deletions(-)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 4320ddc..db5b54a 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1724,21 +1724,21 @@ static int ip6_convert_metrics(struct mx6_config *mxc,
return -EINVAL;
 }
 
-int ip6_route_info_create(struct fib6_config *cfg, struct rt6_info **rt_ret)
+static struct rt6_info *ip6_route_info_create(struct fib6_config *cfg)
 {
-   int err;
struct net *net = cfg->fc_nlinfo.nl_net;
struct rt6_info *rt = NULL;
struct net_device *dev = NULL;
struct inet6_dev *idev = NULL;
struct fib6_table *table;
int addr_type;
+   int err = -EINVAL;
 
if (cfg->fc_dst_len > 128 || cfg->fc_src_len > 128)
-   return -EINVAL;
+   goto out;
 #ifndef CONFIG_IPV6_SUBTREES
if (cfg->fc_src_len)
-   return -EINVAL;
+   goto out;
 #endif
if (cfg->fc_ifindex) {
err = -ENODEV;
@@ -1958,9 +1958,7 @@ install_route:
 
cfg->fc_nlinfo.nl_net = dev_net(dev);
 
-   *rt_ret = rt;
-
-   return 0;
+   return rt;
 out:
if (dev)
dev_put(dev);
@@ -1969,20 +1967,21 @@ out:
if (rt)
dst_free(&rt->dst);
 
-   *rt_ret = NULL;
-
-   return err;
+   return ERR_PTR(err);
 }
 
 int ip6_route_add(struct fib6_config *cfg)
 {
struct mx6_config mxc = { .mx = NULL, };
-   struct rt6_info *rt = NULL;
+   struct rt6_info *rt;
int err;
 
-   err = ip6_route_info_create(cfg, &rt);
-   if (err)
+   rt = ip6_route_info_create(cfg);
+   if (IS_ERR(rt)) {
+   err = PTR_ERR(rt);
+   rt = NULL;
goto out;
+   }
 
err = ip6_convert_metrics(&mxc, cfg);
if (err)
@@ -2871,9 +2870,12 @@ static int ip6_route_multipath_add(struct fib6_config 
*cfg)
r_cfg.fc_encap_type = nla_get_u16(nla);
}
 
-   err = ip6_route_info_create(&r_cfg, &rt);
-   if (err)
+   rt = ip6_route_info_create(&r_cfg);
+   if (IS_ERR(rt)) {
+   err = PTR_ERR(rt);
+   rt = NULL;
goto cleanup;
+   }
 
err = ip6_route_info_append(&rt6_nh_list, rt, &r_cfg);
if (err) {
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next v3 1/2] mpls: multipath route support

2015-10-11 Thread Roopa Prabhu

From: Roopa Prabhu 

This patch adds support for MPLS multipath routes.

Includes following changes to support multipath:
- splits struct mpls_route into 'struct mpls_route + struct mpls_nh'

- 'struct mpls_nh' represents a mpls nexthop label forwarding entry

- moves mpls route and nexthop structures into internal.h

- A mpls_route can point to multiple mpls_nh structs

- the nexthops are maintained as a array (similar to ipv4 fib)

- In the process of restructuring, this patch also consistently changes
  all labels to u8

- Adds support to parse/fill RTA_MULTIPATH netlink attribute for
multipath routes similar to ipv4/v6 fib

- In this patch, the multipath route nexthop selection algorithm
simply returns the first nexthop. It is replaced by a
hash based algorithm from Robert Shearman in the next patch

- mpls_route_update cleanup: remove 'dev' handling in mpls_route_update.
mpls_route_update though implemented to update based on dev, it was
never used that way. And the dev handling gets tricky with multiple nexthops.
Cannot match against any single nexthops dev. So, this patch removes the unused
'dev' handling in mpls_route_update.

Example:

$ip -f mpls route add 100 nexthop as 200 via inet 10.1.1.2 dev swp1 \
nexthop as 700 via inet 10.1.1.6 dev swp2 \
nexthop as 800 via inet 40.1.1.2 dev swp3

$ip  -f mpls route show
100
nexthop as to 200 via inet 10.1.1.2  dev swp1
nexthop as to 700 via inet 10.1.1.6  dev swp2
nexthop as to 800 via inet 40.1.1.2  dev swp3

Signed-off-by: Roopa Prabhu 
---
 include/net/mpls_iptunnel.h |   2 +-
 net/mpls/af_mpls.c  | 500 +++-
 net/mpls/internal.h |  52 -
 3 files changed, 404 insertions(+), 150 deletions(-)

diff --git a/include/net/mpls_iptunnel.h b/include/net/mpls_iptunnel.h
index 4757997..179253f 100644
--- a/include/net/mpls_iptunnel.h
+++ b/include/net/mpls_iptunnel.h
@@ -18,7 +18,7 @@
 
 struct mpls_iptunnel_encap {
u32 label[MAX_NEW_LABELS];
-   u32 labels;
+   u8  labels;
 };
 
 static inline struct mpls_iptunnel_encap *mpls_lwtunnel_encap(struct 
lwtunnel_state *lwtstate)
diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index bb185a2..4d819df 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -19,37 +19,9 @@
 #include 
 #include 
 #endif
+#include 
 #include "internal.h"
 
-#define LABEL_NOT_SPECIFIED (1<<20)
-#define MAX_NEW_LABELS 2
-
-/* This maximum ha length copied from the definition of struct neighbour */
-#define MAX_VIA_ALEN (ALIGN(MAX_ADDR_LEN, sizeof(unsigned long)))
-
-enum mpls_payload_type {
-   MPT_UNSPEC, /* IPv4 or IPv6 */
-   MPT_IPV4 = 4,
-   MPT_IPV6 = 6,
-
-   /* Other types not implemented:
-*  - Pseudo-wire with or without control word (RFC4385)
-*  - GAL (RFC5586)
-*/
-};
-
-struct mpls_route { /* next hop label forwarding entry */
-   struct net_device __rcu *rt_dev;
-   struct rcu_head rt_rcu;
-   u32 rt_label[MAX_NEW_LABELS];
-   u8  rt_protocol; /* routing protocol that set this 
entry */
-   u8  rt_payload_type;
-   u8  rt_labels;
-   u8  rt_via_alen;
-   u8  rt_via_table;
-   u8  rt_via[0];
-};
-
 static int zero = 0;
 static int label_limit = (1 << 20) - 1;
 
@@ -80,10 +52,10 @@ bool mpls_output_possible(const struct net_device *dev)
 }
 EXPORT_SYMBOL_GPL(mpls_output_possible);
 
-static unsigned int mpls_rt_header_size(const struct mpls_route *rt)
+static unsigned int mpls_nh_header_size(const struct mpls_nh *nh)
 {
/* The size of the layer 2.5 labels to be added for this route */
-   return rt->rt_labels * sizeof(struct mpls_shim_hdr);
+   return nh->nh_labels * sizeof(struct mpls_shim_hdr);
 }
 
 unsigned int mpls_dev_mtu(const struct net_device *dev)
@@ -105,6 +77,12 @@ bool mpls_pkt_too_big(const struct sk_buff *skb, unsigned 
int mtu)
 }
 EXPORT_SYMBOL_GPL(mpls_pkt_too_big);
 
+static struct mpls_nh *mpls_select_multipath(struct mpls_route *rt)
+{
+   /* assume single nexthop for now */
+   return &rt->rt_nh[0];
+}
+
 static bool mpls_egress(struct mpls_route *rt, struct sk_buff *skb,
struct mpls_entry_decoded dec)
 {
@@ -159,6 +137,7 @@ static int mpls_forward(struct sk_buff *skb, struct 
net_device *dev,
struct net *net = dev_net(dev);
struct mpls_shim_hdr *hdr;
struct mpls_route *rt;
+   struct mpls_nh *nh;
struct mpls_entry_decoded dec;
struct net_device *out_dev;
struct mpls_dev *mdev;
@@ -166,6 +145,7 @@ static int mpls_forward(struct sk_buff *skb, struct 
net_device *dev,
unsigned int new_header_size;
unsigned int mtu;
int err;
+   int nhid

[PATCH net-next v3 0/2] mpls: multipath support

2015-10-11 Thread Roopa Prabhu

From: Roopa Prabhu 

This patch adds support for MPLS multipath routes.

Includes following changes to support multipath:
- splits struct mpls_route into 'struct mpls_route + struct mpls_nh'.

- struct mpls_nh represents a mpls nexthop label forwarding entry

- Adds support to parse/fill RTA_MULTIPATH netlink attribute for
multipath routes similar to ipv4/v6 fib

- In the process of restructuring, this patch also consistently changes all
labels to u8

$ip -f mpls route add 100 nexthop as 200 via inet 10.1.1.2 dev swp1 \
nexthop as 700 via inet 10.1.1.6 dev swp2 \
nexthop as 800 via inet 40.1.1.2 dev swp3

$ip  -f mpls route show
100 
nexthop as to 200 via inet 10.1.1.2  dev swp1
nexthop as to 700 via inet 10.1.1.6  dev swp2
nexthop as to 800 via inet 40.1.1.2  dev swp3

Roopa Prabhu (2):
  mpls: multipath support
  mpls: flow-based multipath selection

Signed-off-by: Roopa Prabhu 


v2:
- Incorporate some feedback from Robert:
use dynamic allocation (list) instead of static allocation
for nexthops
v3:
- Move back to arrays (same as v1), also suggested by Eric Biederman


 include/net/mpls_iptunnel.h |   2 +-
 net/mpls/af_mpls.c  | 668 ++--
 net/mpls/internal.h |  57 +++-
 3 files changed, 572 insertions(+), 155 deletions(-)

-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next v3 2/2] mpls: flow-based multipath selection

2015-10-11 Thread Roopa Prabhu

From: Robert Shearman 

Change the selection of a multipath route to use a flow-based
hash. This more suitable for traffic sensitive to reordering within a
flow (e.g. TCP, L2VPN) and whilst still allowing a good distribution
of traffic given enough flows.

Selection of the path for a multipath route is done using a hash of:
1. Label stack up to MAX_MP_SELECT_LABELS labels or up to and
   including entropy label, whichever is first.
2. 3-tuple of (L3 src, L3 dst, proto) from IPv4/IPv6 header in MPLS
   payload, if present.

Naturally, a 5-tuple hash using L4 information in addition would be
possible and be better in some scenarios, but there is a tradeoff
between looking deeper into the packet to achieve good distribution,
and packet forwarding performance, and I have erred on the side of the
latter as the default.

Signed-off-by: Robert Shearman 
---
 net/mpls/af_mpls.c | 88 ++
 1 file changed, 83 insertions(+), 5 deletions(-)

diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index 4d819df..15dd2eb 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -22,6 +22,11 @@
 #include 
 #include "internal.h"
 
+/* Maximum number of labels to look ahead at when selecting a path of
+ * a multipath route
+ */
+#define MAX_MP_SELECT_LABELS 4
+
 static int zero = 0;
 static int label_limit = (1 << 20) - 1;
 
@@ -77,10 +82,78 @@ bool mpls_pkt_too_big(const struct sk_buff *skb, unsigned 
int mtu)
 }
 EXPORT_SYMBOL_GPL(mpls_pkt_too_big);
 
-static struct mpls_nh *mpls_select_multipath(struct mpls_route *rt)
+static struct mpls_nh *mpls_select_multipath(struct mpls_route *rt,
+struct sk_buff *skb, bool bos)
 {
-   /* assume single nexthop for now */
-   return &rt->rt_nh[0];
+   struct mpls_entry_decoded dec;
+   struct mpls_shim_hdr *hdr;
+   bool eli_seen = false;
+   int label_index;
+   int nh_index = 0;
+   u32 hash = 0;
+
+   /* No need to look further into packet if there's only
+* one path
+*/
+   if (rt->rt_nhn == 1)
+   goto out;
+
+   for (label_index = 0; label_index < MAX_MP_SELECT_LABELS && !bos;
+label_index++) {
+   if (!pskb_may_pull(skb, sizeof(*hdr) * label_index))
+   break;
+
+   /* Read and decode the current label */
+   hdr = mpls_hdr(skb) + label_index;
+   dec = mpls_entry_decode(hdr);
+
+   /* RFC6790 - reserved labels MUST NOT be used as keys
+* for the load-balancing function
+*/
+   if (dec.label == MPLS_LABEL_ENTROPY) {
+   eli_seen = true;
+   } else if (dec.label >= MPLS_LABEL_FIRST_UNRESERVED) {
+   hash = jhash_1word(dec.label, hash);
+
+   /* The entropy label follows the entropy label
+* indicator, so this means that the entropy
+* label was just added to the hash - no need to
+* go any deeper either in the label stack or in the
+* payload
+*/
+   if (eli_seen)
+   break;
+   }
+
+   bos = dec.bos;
+   if (bos && pskb_may_pull(skb, sizeof(*hdr) * label_index +
+sizeof(struct iphdr))) {
+   const struct iphdr *v4hdr;
+
+   v4hdr = (const struct iphdr *)(mpls_hdr(skb) +
+  label_index);
+   if (v4hdr->version == 4) {
+   hash = jhash_3words(ntohl(v4hdr->saddr),
+   ntohl(v4hdr->daddr),
+   v4hdr->protocol, hash);
+   } else if (v4hdr->version == 6 &&
+   pskb_may_pull(skb, sizeof(*hdr) * label_index +
+ sizeof(struct ipv6hdr))) {
+   const struct ipv6hdr *v6hdr;
+
+   v6hdr = (const struct ipv6hdr *)(mpls_hdr(skb) +
+   label_index);
+
+   hash = __ipv6_addr_jhash(&v6hdr->saddr, hash);
+   hash = __ipv6_addr_jhash(&v6hdr->daddr, hash);
+   hash = jhash_1word(v6hdr->nexthdr, hash);
+   }
+   }
+   }
+
+   nh_index = hash % rt->rt_nhn;
+out:
+   return &rt->rt_nh[nh_index];
 }
 
 static bool mpls_egress(struct mpls_route *rt, struct sk_buff *skb,
@@ -145,7 +218,6 @@ static int mpls_forward(struct sk_buff *skb, struct 
net_device *dev,
unsigned int new_header_size;
unsigned int mtu;
int er

[PATCH net-next iproute2] support batching of ip route get commands

2015-07-15 Thread Roopa Prabhu

From: Roopa Prabhu 

This patch replaces exits with returns in
ip route get command handling. This allows batching
of ip route get commands.

$cat route_get_batch.txt
route get 10.0.14.2
route get 12.0.14.2
route get 10.0.14.4

$ip -batch route_get_batch.txt
local 10.0.14.2 dev lo  src 10.0.14.2
cache 
12.0.14.2 via 192.168.0.2 dev eth0  src 192.168.0.15
cache
10.0.14.4 dev dummy0  src 10.0.14.2
cache

Signed-off-by: Roopa Prabhu 
---
 ip/iproute.c |6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/ip/iproute.c b/ip/iproute.c
index 41dea8f..8f49e62 100644
--- a/ip/iproute.c
+++ b/ip/iproute.c
@@ -1682,15 +1682,15 @@ static int iproute_get(int argc, char **argv)
req.n.nlmsg_type = RTM_GETROUTE;
 
if (rtnl_talk(&rth, &req.n, &req.n, sizeof(req)) < 0)
-   exit(2);
+   return -2;
}
 
if (print_route(NULL, &req.n, (void*)stdout) < 0) {
fprintf(stderr, "An error :-)\n");
-   exit(1);
+   return -1;
}
 
-   exit(0);
+   return 0;
 }
 
 static int restore_handler(const struct sockaddr_nl *nl,
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next] mpls: make RTA_OIF optional

2015-07-21 Thread Roopa Prabhu

From: Roopa Prabhu 

If user did not specify an oif, try and get it from the via address.
If failed to get device, return with -ENODEV.

Signed-off-by: Roopa Prabhu 
---
 net/mpls/af_mpls.c |   67 +++-
 1 file changed, 66 insertions(+), 1 deletion(-)

diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index 1f93a59..4cd3789 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -15,6 +15,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "internal.h"
 
 #define LABEL_NOT_SPECIFIED (1<<20)
@@ -327,6 +328,70 @@ static unsigned find_free_label(struct net *net)
return LABEL_NOT_SPECIFIED;
 }
 
+static struct net_device *inet_fib_lookup_dev(struct net *net, void *addr)
+{
+   struct net_device *dev = NULL;
+   struct rtable *rt;
+   struct in_addr daddr;
+
+   memcpy(&daddr, addr, sizeof(struct in_addr));
+   rt = ip_route_output(net, daddr.s_addr, 0, 0, 0);
+   if (IS_ERR(rt))
+   goto errout;
+
+   dev = rt->dst.dev;
+   dev_hold(dev);
+
+   ip_rt_put(rt);
+
+errout:
+   return dev;
+}
+
+static struct net_device *inet6_fib_lookup_dev(struct net *net, void *addr)
+{
+   struct net_device *dev = NULL;
+   struct dst_entry *dst;
+   struct flowi6 fl6;
+
+   memset(&fl6, 0, sizeof(fl6));
+   memcpy(&fl6.daddr, addr, sizeof(struct in6_addr));
+   dst = ip6_route_output(net, NULL, &fl6);
+   if (dst->error)
+   goto errout;
+
+   dev = dst->dev;
+   dev_hold(dev);
+
+errout:
+   dst_release(dst);
+
+   return dev;
+}
+
+static struct net_device *find_outdev(struct net *net,
+ struct mpls_route_config *cfg)
+{
+   struct net_device *dev = NULL;
+
+   if (!cfg->rc_ifindex) {
+   switch (cfg->rc_via_table) {
+   case NEIGH_ARP_TABLE:
+   dev = inet_fib_lookup_dev(net, cfg->rc_via);
+   break;
+   case NEIGH_ND_TABLE:
+   dev = inet6_fib_lookup_dev(net, cfg->rc_via);
+   break;
+   case NEIGH_LINK_TABLE:
+   break;
+   }
+   } else {
+   dev = dev_get_by_index(net, cfg->rc_ifindex);
+   }
+
+   return dev;
+}
+
 static int mpls_route_add(struct mpls_route_config *cfg)
 {
struct mpls_route __rcu **platform_label;
@@ -358,7 +423,7 @@ static int mpls_route_add(struct mpls_route_config *cfg)
goto errout;
 
err = -ENODEV;
-   dev = dev_get_by_index(net, cfg->rc_ifindex);
+   dev = find_outdev(net, cfg);
if (!dev)
goto errout;
 
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next] mpls_iptunnel: fix sparse warn: remove incorrect rcu_dereference

2015-07-21 Thread Roopa Prabhu

From: Roopa Prabhu 

fix for:
net/mpls/mpls_iptunnel.c:73:19: sparse: incompatible types in comparison
expression (different address spaces)

remove incorrect rcu_dereference possibly left over from
earlier revisions of the code.

Reported-by: kbuild test robot 
Signed-off-by: Roopa Prabhu 
---
 net/mpls/mpls_iptunnel.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/mpls/mpls_iptunnel.c b/net/mpls/mpls_iptunnel.c
index eea096f..276f8c9 100644
--- a/net/mpls/mpls_iptunnel.c
+++ b/net/mpls/mpls_iptunnel.c
@@ -70,7 +70,7 @@ int mpls_output(struct sock *sk, struct sk_buff *skb)
skb_orphan(skb);
 
/* Find the output device */
-   out_dev = rcu_dereference(dst->dev);
+   out_dev = dst->dev;
if (!mpls_output_possible(out_dev) ||
!lwtstate || skb_warn_if_lro(skb))
goto drop;
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

< 1 2 3 4 5 6 7 8 9 10 >

201 - 300 of 1003 matches

Mail list logo