date:20091126

Re: [PATCHv3 0/4] macvlan: add vepa and bridge mode

2009-11-26 Thread David Miller

From: Patrick McHardy 
Date: Thu, 26 Nov 2009 17:26:17 +0100

> Arnd Bergmann wrote:
>> Version 2 description:
>> The patch to iproute2 has not changed, so I'm not including
>> it this time. Patch 4/4 (the netlink interface) is basically
>> unchanged as well but included for completeness.
>> 
>> The other changes have moved forward a bit, to the point where
>> I find them a lot cleaner and am more confident in the code
>> being ready for inclusion. The implementation hardly resembles
>> Erics original patch now, so I've dropped his signed-off-by.
>> 
>> Please take a look and ack if you are happy so we can get it
>> into 2.6.33.
> 
> Looks good to me, nice work.
> 
> Acked-by: Patrick McHardy 
> 
> for the entire series.

All applied to net-next-2.6, thanks everyone!
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [PATCH 1/4] veth: move loopback logic to common location

2009-11-26 Thread Patrick McHardy

Arnd Bergmann wrote:
> On Thursday 26 November 2009, Patrick McHardy wrote:
>> In addition to those already handled, I'd say
>>
>> - priority: affects qdisc classification, may refer to classes of the
>>   old namespace
>> - ipvs_property: might cause packets to incorrectly skip netfilter hooks
>> - nf_trace: might trigger packet tracing
>> - nf_bridge: contains references to network devices in the old NS,
>>   also indicates packet was bridged
>> - iif: index is only valid in the originating namespace
>> - probably secmark.
> 
> ok
> 
>> - tc_index: classification result, should only be set in the namespace
>>   of the classifier
>> - tc_verd: RTTL etc. should begin at zero again
> 
> Wouldn't that defeat the purpose of RTTL? If you create a loop
> across two devices in different namespaces, it may no longer get
> detected. Or is that a different problem again?

Mhh good point, that would indeed be possible. OTOH using ingress
filtering in one namespace currently might cause the packet to get
dropped in a different namespace because the ttl runs out. For now
I'd suggest to go the safe route and keep the TTL intact until we
can come up with something better.

> +void skb_set_dev(struct sk_buff *skb, struct net_device *dev)
> +{
> + if (skb->dev && !net_eq(dev_net(skb->dev), dev_net(dev))) {
> + secpath_reset(skb);
> + skb_dst_drop(skb);
> + nf_reset(skb);
> + skb_init_secmark(skb);
> + skb->mark = 0;
> + skb->priority = 0;
> + skb->nf_trace = 0;
> + skb->ipvs_property = 0;
> +#ifdef CONFIG_NET_SCHED
> + skb->tc_index = 0;
> +#ifdef CONFIG_NET_CLS_ACT
> + skb->tc_verd = SET_TC_VERD(skb->tc_verd, 0);
> + skb->tc_verd = SET_TC_RTTL(skb->tc_verd, 0);
> +#endif
> +#endif

This makes we wonder which ones we actually should keep. Most of the
others get reinitialized anyways, so maybe its better to simply clear
the entire area up until ->tail like f.i. skb_recycle_check().

> + }
> + skb->dev = dev;
> + skb->skb_iif = skb->dev->ifindex;

This doesn't seem necessary, if the packet goes through
netif_receive_skb, it will be set anyways.

> +}
> +EXPORT_SYMBOL(skb_set_dev);
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [PATCH 1/4] veth: move loopback logic to common location

2009-11-26 Thread Arnd Bergmann

On Thursday 26 November 2009, Patrick McHardy wrote:
> In addition to those already handled, I'd say
> 
> - priority: affects qdisc classification, may refer to classes of the
>   old namespace
> - ipvs_property: might cause packets to incorrectly skip netfilter hooks
> - nf_trace: might trigger packet tracing
> - nf_bridge: contains references to network devices in the old NS,
>   also indicates packet was bridged
> - iif: index is only valid in the originating namespace
> - probably secmark.

ok

> - tc_index: classification result, should only be set in the namespace
>   of the classifier
> - tc_verd: RTTL etc. should begin at zero again

Wouldn't that defeat the purpose of RTTL? If you create a loop
across two devices in different namespaces, it may no longer get
detected. Or is that a different problem again?

Arnd <><

---

net: maintain namespace isolation between vlan and real device

In the vlan and macvlan drivers, the start_xmit function forwards
data to the dev_queue_xmit function for another device, which may
potentially belong to a different namespace.

To make sure that classification stays within a single namespace,
this resets the potentially critical fields.

Still needs testing, don't apply

Signed-off-by: Arnd Bergmann 
---
 drivers/net/macvlan.c |2 +-
 include/linux/netdevice.h |9 +
 net/8021q/vlan_dev.c  |2 +-
 net/core/dev.c|   37 +
 4 files changed, 44 insertions(+), 6 deletions(-)

diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
index 322112c..edcebf1 100644
--- a/drivers/net/macvlan.c
+++ b/drivers/net/macvlan.c
@@ -269,7 +269,7 @@ static int macvlan_queue_xmit(struct sk_buff *skb, struct 
net_device *dev)
}
 
 xmit_world:
-   skb->dev = vlan->lowerdev;
+   skb_set_dev(skb, vlan->lowerdev);
return dev_queue_xmit(skb);
 }
 
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 9428793..fdf4a1a 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1009,6 +1009,15 @@ static inline bool netdev_uses_dsa_tags(struct 
net_device *dev)
return 0;
 }
 
+#ifdef CONFIG_NET_NS
+static inline void skb_set_dev(struct sk_buff *skb, struct net_device *dev)
+{
+   skb->dev = dev;
+}
+#else /* CONFIG_NET_NS */
+void skb_set_dev(struct sk_buff *skb, struct net_device *dev);
+#endif
+
 static inline bool netdev_uses_trailer_tags(struct net_device *dev)
 {
 #ifdef CONFIG_NET_DSA_TAG_TRAILER
diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
index de0dc6b..51fcfff 100644
--- a/net/8021q/vlan_dev.c
+++ b/net/8021q/vlan_dev.c
@@ -323,7 +323,7 @@ static netdev_tx_t vlan_dev_hard_start_xmit(struct sk_buff 
*skb,
}
 
 
-   skb->dev = vlan_dev_info(dev)->real_dev;
+   skb_set_dev(skb, vlan_dev_info(dev)->real_dev);
len = skb->len;
ret = dev_queue_xmit(skb);
 
diff --git a/net/core/dev.c b/net/core/dev.c
index f8baa15..220d4e4 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1448,13 +1448,10 @@ int dev_forward_skb(struct net_device *dev, struct 
sk_buff *skb)
if (skb->len > (dev->mtu + dev->hard_header_len))
return NET_RX_DROP;
 
-   skb_dst_drop(skb);
+   skb_set_dev(skb, dev);
skb->tstamp.tv64 = 0;
skb->pkt_type = PACKET_HOST;
skb->protocol = eth_type_trans(skb, dev);
-   skb->mark = 0;
-   secpath_reset(skb);
-   nf_reset(skb);
return netif_rx(skb);
 }
 EXPORT_SYMBOL_GPL(dev_forward_skb);
@@ -1614,6 +1611,39 @@ static bool dev_can_checksum(struct net_device *dev, 
struct sk_buff *skb)
return false;
 }
 
+/**
+ * skb_dev_set -- assign a buffer to a new device
+ * @skb: buffer for the new device
+ * @dev: network device
+ *
+ * If an skb is owned by a device already, we have to reset
+ * all data private to the namespace a device belongs to
+ * before assigning it a new device.
+ */
+void skb_set_dev(struct sk_buff *skb, struct net_device *dev)
+{
+   if (skb->dev && !net_eq(dev_net(skb->dev), dev_net(dev))) {
+   secpath_reset(skb);
+   skb_dst_drop(skb);
+   nf_reset(skb);
+   skb_init_secmark(skb);
+   skb->mark = 0;
+   skb->priority = 0;
+   skb->nf_trace = 0;
+   skb->ipvs_property = 0;
+#ifdef CONFIG_NET_SCHED
+   skb->tc_index = 0;
+#ifdef CONFIG_NET_CLS_ACT
+   skb->tc_verd = SET_TC_VERD(skb->tc_verd, 0);
+   skb->tc_verd = SET_TC_RTTL(skb->tc_verd, 0);
+#endif
+#endif
+   }
+   skb->dev = dev;
+   skb->skb_iif = skb->dev->ifindex;
+}
+EXPORT_SYMBOL(skb_set_dev);
+
 /*
  * Invalidate hardware checksum when packet is to be mangled, and
  * complete checksum manually on outgoing path.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [PATCH 1/4] veth: move loopback logic to common location

2009-11-26 Thread Eric W. Biederman

Patrick McHardy  writes:

> Arnd Bergmann wrote:
>> On Tuesday 24 November 2009, Patrick McHardy wrote:
>>> Eric W. Biederman wrote:
 I don't quite follow what you intend with dev_queue_xmit when the macvlan
 is in one namespace and the real physical device is in another.  Are
 you mentioning that the packet classifier runs in the namespace where
 the primary device lives with packets from a different namespace?
>>> Exactly. And I think we should make sure that the namespace of
>>> the macvlan device can't (deliberately or accidentally) cause
>>> misclassification.
>> 
>> This is independent of my series and a preexisting problem, right?
>
> Correct.
>
>> Which fields do you think need to be reset to maintain namespace
>> isolation for the outbound path in macvlan?
>
> In addition to those already handled, I'd say
>
> - priority: affects qdisc classification, may refer to classes of the
>   old namespace
> - ipvs_property: might cause packets to incorrectly skip netfilter hooks
> - nf_trace: might trigger packet tracing
> - nf_bridge: contains references to network devices in the old NS,
>   also indicates packet was bridged
> - iif: index is only valid in the originating namespace
> - tc_index: classification result, should only be set in the namespace
>   of the classifier
> - tc_verd: RTTL etc. should begin at zero again
> - probably secmark.

Wow.  I thought we were trying to reduce skbuff, where did all of those
fields come from?  Regarless that sounds like a good list to get stomped.

Eric

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [PATCHv3 0/4] macvlan: add vepa and bridge mode

2009-11-26 Thread Patrick McHardy

Arnd Bergmann wrote:
> Version 2 description:
> The patch to iproute2 has not changed, so I'm not including
> it this time. Patch 4/4 (the netlink interface) is basically
> unchanged as well but included for completeness.
> 
> The other changes have moved forward a bit, to the point where
> I find them a lot cleaner and am more confident in the code
> being ready for inclusion. The implementation hardly resembles
> Erics original patch now, so I've dropped his signed-off-by.
> 
> Please take a look and ack if you are happy so we can get it
> into 2.6.33.

Looks good to me, nice work.

Acked-by: Patrick McHardy 

for the entire series.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

[PATCHv3 2/4] macvlan: cleanup rx statistics

2009-11-26 Thread Arnd Bergmann

We have very similar code for rx statistics in
two places in the macvlan driver, with a third
one being added in the next patch.

Consolidate them into one function to improve
overall readability of the driver.

Signed-off-by: Arnd Bergmann 
---
 drivers/net/macvlan.c |   70 
 1 files changed, 41 insertions(+), 29 deletions(-)

diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
index ae2b5c7..1e7faf9 100644
--- a/drivers/net/macvlan.c
+++ b/drivers/net/macvlan.c
@@ -116,42 +116,58 @@ static int macvlan_addr_busy(const struct macvlan_port 
*port,
return 0;
 }
 
+static inline void macvlan_count_rx(const struct macvlan_dev *vlan,
+   unsigned int len, bool success,
+   bool multicast)
+{
+   struct macvlan_rx_stats *rx_stats;
+
+   rx_stats = per_cpu_ptr(vlan->rx_stats, smp_processor_id());
+   if (likely(success)) {
+   rx_stats->rx_packets++;;
+   rx_stats->rx_bytes += len;
+   if (multicast)
+   rx_stats->multicast++;
+   } else {
+   rx_stats->rx_errors++;
+   }
+}
+
+static int macvlan_broadcast_one(struct sk_buff *skb, struct net_device *dev,
+const struct ethhdr *eth)
+{
+   if (!skb)
+   return NET_RX_DROP;
+
+   skb->dev = dev;
+   if (!compare_ether_addr_64bits(eth->h_dest,
+  dev->broadcast))
+   skb->pkt_type = PACKET_BROADCAST;
+   else
+   skb->pkt_type = PACKET_MULTICAST;
+
+   return netif_rx(skb);
+}
+
 static void macvlan_broadcast(struct sk_buff *skb,
  const struct macvlan_port *port)
 {
const struct ethhdr *eth = eth_hdr(skb);
const struct macvlan_dev *vlan;
struct hlist_node *n;
-   struct net_device *dev;
struct sk_buff *nskb;
unsigned int i;
-   struct macvlan_rx_stats *rx_stats;
+   int err;
 
if (skb->protocol == htons(ETH_P_PAUSE))
return;
 
for (i = 0; i < MACVLAN_HASH_SIZE; i++) {
hlist_for_each_entry_rcu(vlan, n, &port->vlan_hash[i], hlist) {
-   dev = vlan->dev;
-   rx_stats = per_cpu_ptr(vlan->rx_stats, 
smp_processor_id());
-
nskb = skb_clone(skb, GFP_ATOMIC);
-   if (nskb == NULL) {
-   rx_stats->rx_errors++;
-   continue;
-   }
-
-   rx_stats->rx_bytes += skb->len + ETH_HLEN;
-   rx_stats->rx_packets++;
-   rx_stats->multicast++;
-
-   nskb->dev = dev;
-   if (!compare_ether_addr_64bits(eth->h_dest, 
dev->broadcast))
-   nskb->pkt_type = PACKET_BROADCAST;
-   else
-   nskb->pkt_type = PACKET_MULTICAST;
-
-   netif_rx(nskb);
+   err = macvlan_broadcast_one(nskb, vlan->dev, eth);
+   macvlan_count_rx(vlan, skb->len + ETH_HLEN,
+err == NET_RX_SUCCESS, 1);
}
}
 }
@@ -163,7 +179,7 @@ static struct sk_buff *macvlan_handle_frame(struct sk_buff 
*skb)
const struct macvlan_port *port;
const struct macvlan_dev *vlan;
struct net_device *dev;
-   struct macvlan_rx_stats *rx_stats;
+   unsigned int len;
 
port = rcu_dereference(skb->dev->macvlan_port);
if (port == NULL)
@@ -183,15 +199,11 @@ static struct sk_buff *macvlan_handle_frame(struct 
sk_buff *skb)
kfree_skb(skb);
return NULL;
}
-   rx_stats = per_cpu_ptr(vlan->rx_stats, smp_processor_id());
+   len = skb->len + ETH_HLEN;
skb = skb_share_check(skb, GFP_ATOMIC);
-   if (skb == NULL) {
-   rx_stats->rx_errors++;
+   macvlan_count_rx(vlan, len, skb != NULL, 0);
+   if (!skb)
return NULL;
-   }
-
-   rx_stats->rx_bytes += skb->len + ETH_HLEN;
-   rx_stats->rx_packets++;
 
skb->dev = dev;
skb->pkt_type = PACKET_HOST;
-- 
1.6.3.3

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

[PATCHv3 1/4] veth: move loopback logic to common location

2009-11-26 Thread Arnd Bergmann

The veth driver contains code to forward an skb
from the start_xmit function of one network
device into the receive path of another device.

Moving that code into a common location lets us
reuse the code for direct forwarding of data
between macvlan ports, and possibly in other
drivers.

Signed-off-by: Arnd Bergmann 
---
 drivers/net/veth.c|   17 +++--
 include/linux/netdevice.h |2 ++
 net/core/dev.c|   40 
 3 files changed, 45 insertions(+), 14 deletions(-)

diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index 2d657f2..6c4b5a2 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -155,8 +155,6 @@ static netdev_tx_t veth_xmit(struct sk_buff *skb, struct 
net_device *dev)
struct veth_net_stats *stats, *rcv_stats;
int length, cpu;
 
-   skb_orphan(skb);
-
priv = netdev_priv(dev);
rcv = priv->peer;
rcv_priv = netdev_priv(rcv);
@@ -168,20 +166,12 @@ static netdev_tx_t veth_xmit(struct sk_buff *skb, struct 
net_device *dev)
if (!(rcv->flags & IFF_UP))
goto tx_drop;
 
-   if (skb->len > (rcv->mtu + MTU_PAD))
-   goto rx_drop;
-
-skb->tstamp.tv64 = 0;
-   skb->pkt_type = PACKET_HOST;
-   skb->protocol = eth_type_trans(skb, rcv);
if (dev->features & NETIF_F_NO_CSUM)
skb->ip_summed = rcv_priv->ip_summed;
 
-   skb->mark = 0;
-   secpath_reset(skb);
-   nf_reset(skb);
-
-   length = skb->len;
+   length = skb->len + ETH_HLEN;
+   if (dev_forward_skb(rcv, skb) != NET_RX_SUCCESS)
+   goto rx_drop;
 
stats->tx_bytes += length;
stats->tx_packets++;
@@ -189,7 +179,6 @@ static netdev_tx_t veth_xmit(struct sk_buff *skb, struct 
net_device *dev)
rcv_stats->rx_bytes += length;
rcv_stats->rx_packets++;
 
-   netif_rx(skb);
return NETDEV_TX_OK;
 
 tx_drop:
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 97873e3..9428793 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1562,6 +1562,8 @@ extern intdev_set_mac_address(struct 
net_device *,
 extern int dev_hard_start_xmit(struct sk_buff *skb,
struct net_device *dev,
struct netdev_queue *txq);
+extern int dev_forward_skb(struct net_device *dev,
+   struct sk_buff *skb);
 
 extern int netdev_budget;
 
diff --git a/net/core/dev.c b/net/core/dev.c
index ccefa24..f8baa15 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -105,6 +105,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -1419,6 +1420,45 @@ static inline void net_timestamp(struct sk_buff *skb)
skb->tstamp.tv64 = 0;
 }
 
+/**
+ * dev_forward_skb - loopback an skb to another netif
+ *
+ * @dev: destination network device
+ * @skb: buffer to forward
+ *
+ * return values:
+ * NET_RX_SUCCESS  (no congestion)
+ * NET_RX_DROP (packet was dropped)
+ *
+ * dev_forward_skb can be used for injecting an skb from the
+ * start_xmit function of one device into the receive queue
+ * of another device.
+ *
+ * The receiving device may be in another namespace, so
+ * we have to clear all information in the skb that could
+ * impact namespace isolation.
+ */
+int dev_forward_skb(struct net_device *dev, struct sk_buff *skb)
+{
+   skb_orphan(skb);
+
+   if (!(dev->flags & IFF_UP))
+   return NET_RX_DROP;
+
+   if (skb->len > (dev->mtu + dev->hard_header_len))
+   return NET_RX_DROP;
+
+   skb_dst_drop(skb);
+   skb->tstamp.tv64 = 0;
+   skb->pkt_type = PACKET_HOST;
+   skb->protocol = eth_type_trans(skb, dev);
+   skb->mark = 0;
+   secpath_reset(skb);
+   nf_reset(skb);
+   return netif_rx(skb);
+}
+EXPORT_SYMBOL_GPL(dev_forward_skb);
+
 /*
  * Support routine. Sends outgoing frames to any network
  * taps currently in use.
-- 
1.6.3.3

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

[PATCHv3 4/4] macvlan: export macvlan mode through netlink

2009-11-26 Thread Arnd Bergmann

In order to support all three modes of macvlan at
runtime, extend the existing netlink protocol
to allow choosing the mode per macvlan slave
interface.

This depends on a matching patch to iproute2
in order to become accessible in user land.

Signed-off-by: Arnd Bergmann 
---
 drivers/net/macvlan.c   |   56 +-
 include/linux/if_link.h |   15 
 2 files changed, 65 insertions(+), 6 deletions(-)

diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
index d6bd843..322112c 100644
--- a/drivers/net/macvlan.c
+++ b/drivers/net/macvlan.c
@@ -33,12 +33,6 @@
 
 #define MACVLAN_HASH_SIZE  (1 << BITS_PER_BYTE)
 
-enum macvlan_mode {
-   MACVLAN_MODE_PRIVATE= 1,
-   MACVLAN_MODE_VEPA   = 2,
-   MACVLAN_MODE_BRIDGE = 4,
-};
-
 struct macvlan_port {
struct net_device   *dev;
struct hlist_head   vlan_hash[MACVLAN_HASH_SIZE];
@@ -614,6 +608,17 @@ static int macvlan_validate(struct nlattr *tb[], struct 
nlattr *data[])
if (!is_valid_ether_addr(nla_data(tb[IFLA_ADDRESS])))
return -EADDRNOTAVAIL;
}
+
+   if (data && data[IFLA_MACVLAN_MODE]) {
+   switch (nla_get_u32(data[IFLA_MACVLAN_MODE])) {
+   case MACVLAN_MODE_PRIVATE:
+   case MACVLAN_MODE_VEPA:
+   case MACVLAN_MODE_BRIDGE:
+   break;
+   default:
+   return -EINVAL;
+   }
+   }
return 0;
 }
 
@@ -678,6 +683,10 @@ static int macvlan_newlink(struct net *src_net, struct 
net_device *dev,
vlan->dev  = dev;
vlan->port = port;
 
+   vlan->mode = MACVLAN_MODE_VEPA;
+   if (data && data[IFLA_MACVLAN_MODE])
+   vlan->mode = nla_get_u32(data[IFLA_MACVLAN_MODE]);
+
err = register_netdevice(dev);
if (err < 0)
return err;
@@ -699,6 +708,36 @@ static void macvlan_dellink(struct net_device *dev, struct 
list_head *head)
macvlan_port_destroy(port->dev);
 }
 
+static int macvlan_changelink(struct net_device *dev,
+   struct nlattr *tb[], struct nlattr *data[])
+{
+   struct macvlan_dev *vlan = netdev_priv(dev);
+   if (data && data[IFLA_MACVLAN_MODE])
+   vlan->mode = nla_get_u32(data[IFLA_MACVLAN_MODE]);
+   return 0;
+}
+
+static size_t macvlan_get_size(const struct net_device *dev)
+{
+   return nla_total_size(4);
+}
+
+static int macvlan_fill_info(struct sk_buff *skb,
+   const struct net_device *dev)
+{
+   struct macvlan_dev *vlan = netdev_priv(dev);
+
+   NLA_PUT_U32(skb, IFLA_MACVLAN_MODE, vlan->mode);
+   return 0;
+
+nla_put_failure:
+   return -EMSGSIZE;
+}
+
+static const struct nla_policy macvlan_policy[IFLA_MACVLAN_MAX + 1] = {
+   [IFLA_MACVLAN_MODE] = { .type = NLA_U32 },
+};
+
 static struct rtnl_link_ops macvlan_link_ops __read_mostly = {
.kind   = "macvlan",
.priv_size  = sizeof(struct macvlan_dev),
@@ -707,6 +746,11 @@ static struct rtnl_link_ops macvlan_link_ops __read_mostly 
= {
.validate   = macvlan_validate,
.newlink= macvlan_newlink,
.dellink= macvlan_dellink,
+   .maxtype= IFLA_MACVLAN_MAX,
+   .policy = macvlan_policy,
+   .changelink = macvlan_changelink,
+   .get_size   = macvlan_get_size,
+   .fill_info  = macvlan_fill_info,
 };
 
 static int macvlan_device_event(struct notifier_block *unused,
diff --git a/include/linux/if_link.h b/include/linux/if_link.h
index 1d3b242..6674791 100644
--- a/include/linux/if_link.h
+++ b/include/linux/if_link.h
@@ -181,4 +181,19 @@ struct ifla_vlan_qos_mapping {
__u32 to;
 };
 
+/* MACVLAN section */
+enum {
+   IFLA_MACVLAN_UNSPEC,
+   IFLA_MACVLAN_MODE,
+   __IFLA_MACVLAN_MAX,
+};
+
+#define IFLA_MACVLAN_MAX (__IFLA_MACVLAN_MAX - 1)
+
+enum macvlan_mode {
+   MACVLAN_MODE_PRIVATE = 1, /* don't talk to other macvlans */
+   MACVLAN_MODE_VEPA= 2, /* talk to other ports through ext bridge */
+   MACVLAN_MODE_BRIDGE  = 4, /* talk to bridge ports directly */
+};
+
 #endif /* _LINUX_IF_LINK_H */
-- 
1.6.3.3

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

[PATCHv3 3/4] macvlan: implement bridge, VEPA and private mode

2009-11-26 Thread Arnd Bergmann

This allows each macvlan slave device to be in one
of three modes, depending on the use case:

MACVLAN_PRIVATE:
  The device never communicates with any other device
  on the same upper_dev. This even includes frames
  coming back from a reflective relay, where supported
  by the adjacent bridge.

MACVLAN_VEPA:
  The new Virtual Ethernet Port Aggregator (VEPA) mode,
  we assume that the adjacent bridge returns all frames
  where both source and destination are local to the
  macvlan port, i.e. the bridge is set up as a reflective
  relay.
  Broadcast frames coming in from the upper_dev get
  flooded to all macvlan interfaces in VEPA mode.
  We never deliver any frames locally.

MACVLAN_BRIDGE:
  We provide the behavior of a simple bridge between
  different macvlan interfaces on the same port. Frames
  from one interface to another one get delivered directly
  and are not sent out externally. Broadcast frames get
  flooded to all other bridge ports and to the external
  interface, but when they come back from a reflective
  relay, we don't deliver them again.
  Since we know all the MAC addresses, the macvlan bridge
  mode does not require learning or STP like the bridge
  module does.

Based on an earlier patch "macvlan: Reflect macvlan packets
meant for other macvlan devices" by Eric Biederman.

Signed-off-by: Arnd Bergmann 
Cc: Eric Biederman 
---
 drivers/net/macvlan.c |   80 -
 1 files changed, 72 insertions(+), 8 deletions(-)

diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
index 1e7faf9..d6bd843 100644
--- a/drivers/net/macvlan.c
+++ b/drivers/net/macvlan.c
@@ -29,9 +29,16 @@
 #include 
 #include 
 #include 
+#include 
 
 #define MACVLAN_HASH_SIZE  (1 << BITS_PER_BYTE)
 
+enum macvlan_mode {
+   MACVLAN_MODE_PRIVATE= 1,
+   MACVLAN_MODE_VEPA   = 2,
+   MACVLAN_MODE_BRIDGE = 4,
+};
+
 struct macvlan_port {
struct net_device   *dev;
struct hlist_head   vlan_hash[MACVLAN_HASH_SIZE];
@@ -59,6 +66,7 @@ struct macvlan_dev {
struct macvlan_port *port;
struct net_device   *lowerdev;
struct macvlan_rx_stats *rx_stats;
+   enum macvlan_mode   mode;
 };
 
 
@@ -134,11 +142,14 @@ static inline void macvlan_count_rx(const struct 
macvlan_dev *vlan,
 }
 
 static int macvlan_broadcast_one(struct sk_buff *skb, struct net_device *dev,
-const struct ethhdr *eth)
+const struct ethhdr *eth, bool local)
 {
if (!skb)
return NET_RX_DROP;
 
+   if (local)
+   return dev_forward_skb(dev, skb);
+
skb->dev = dev;
if (!compare_ether_addr_64bits(eth->h_dest,
   dev->broadcast))
@@ -150,7 +161,9 @@ static int macvlan_broadcast_one(struct sk_buff *skb, 
struct net_device *dev,
 }
 
 static void macvlan_broadcast(struct sk_buff *skb,
- const struct macvlan_port *port)
+ const struct macvlan_port *port,
+ struct net_device *src,
+ enum macvlan_mode mode)
 {
const struct ethhdr *eth = eth_hdr(skb);
const struct macvlan_dev *vlan;
@@ -164,8 +177,12 @@ static void macvlan_broadcast(struct sk_buff *skb,
 
for (i = 0; i < MACVLAN_HASH_SIZE; i++) {
hlist_for_each_entry_rcu(vlan, n, &port->vlan_hash[i], hlist) {
+   if (vlan->dev == src || !(vlan->mode & mode))
+   continue;
+
nskb = skb_clone(skb, GFP_ATOMIC);
-   err = macvlan_broadcast_one(nskb, vlan->dev, eth);
+   err = macvlan_broadcast_one(nskb, vlan->dev, eth,
+mode == MACVLAN_MODE_BRIDGE);
macvlan_count_rx(vlan, skb->len + ETH_HLEN,
 err == NET_RX_SUCCESS, 1);
}
@@ -178,6 +195,7 @@ static struct sk_buff *macvlan_handle_frame(struct sk_buff 
*skb)
const struct ethhdr *eth = eth_hdr(skb);
const struct macvlan_port *port;
const struct macvlan_dev *vlan;
+   const struct macvlan_dev *src;
struct net_device *dev;
unsigned int len;
 
@@ -186,7 +204,25 @@ static struct sk_buff *macvlan_handle_frame(struct sk_buff 
*skb)
return skb;
 
if (is_multicast_ether_addr(eth->h_dest)) {
-   macvlan_broadcast(skb, port);
+   src = macvlan_hash_lookup(port, eth->h_source);
+   if (!src)
+   /* frame comes from an external address */
+   macvlan_broadcast(skb, port, NULL,
+ MACVLAN_MODE_PRIVATE |
+ MACVLAN_MODE_VEPA|
+ MACVLAN_MODE_BRIDGE);
+   el

[PATCHv3 0/4] macvlan: add vepa and bridge mode

2009-11-26 Thread Arnd Bergmann

Not many changes this time, just integrated a bug fix
and all the coding style feedback from Eric Dumazet
and Patrick McHardy.

I'll keep the patch for network namespaces on the tx
path out of this series for now, because the discussion
is still ongoing and it addresses an unrelated issue.

---

Version 2 description:
The patch to iproute2 has not changed, so I'm not including
it this time. Patch 4/4 (the netlink interface) is basically
unchanged as well but included for completeness.

The other changes have moved forward a bit, to the point where
I find them a lot cleaner and am more confident in the code
being ready for inclusion. The implementation hardly resembles
Erics original patch now, so I've dropped his signed-off-by.

Please take a look and ack if you are happy so we can get it
into 2.6.33.

---

Version 1 description:
This is based on an earlier patch from Eric Biederman adding
forwarding between macvlans. I extended his approach to
allow the administrator to choose the mode for each macvlan,
and to implement a functional VEPA between macvlan.

Still missing from this is support for communication between
the lower device that the macvlans are based on. This would
be extremely useful but as others have found out before me
requires significant changes not only to macvlan but also
to the common transmit path.

I've tested VEPA operation with the hairpin support
added to the bridge driver by Anna Fischer.

---
Arnd Bergmann (4):
  veth: move loopback logic to common location
  macvlan: cleanup rx statistics
  macvlan: implement bridge, VEPA and private mode
  macvlan: export macvlan mode through netlink

 drivers/net/macvlan.c |  188 +
 drivers/net/veth.c|   17 +
 include/linux/if_link.h   |   15 
 include/linux/netdevice.h |2 +
 net/core/dev.c|   40 ++
 5 files changed, 214 insertions(+), 48 deletions(-)

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [PATCH 1/4] veth: move loopback logic to common location

2009-11-26 Thread Patrick McHardy

Arnd Bergmann wrote:
> On Tuesday 24 November 2009, Patrick McHardy wrote:
>> Eric W. Biederman wrote:
>>> I don't quite follow what you intend with dev_queue_xmit when the macvlan
>>> is in one namespace and the real physical device is in another.  Are
>>> you mentioning that the packet classifier runs in the namespace where
>>> the primary device lives with packets from a different namespace?
>> Exactly. And I think we should make sure that the namespace of
>> the macvlan device can't (deliberately or accidentally) cause
>> misclassification.
> 
> This is independent of my series and a preexisting problem, right?

Correct.

> Which fields do you think need to be reset to maintain namespace
> isolation for the outbound path in macvlan?

In addition to those already handled, I'd say

- priority: affects qdisc classification, may refer to classes of the
  old namespace
- ipvs_property: might cause packets to incorrectly skip netfilter hooks
- nf_trace: might trigger packet tracing
- nf_bridge: contains references to network devices in the old NS,
  also indicates packet was bridged
- iif: index is only valid in the originating namespace
- tc_index: classification result, should only be set in the namespace
  of the classifier
- tc_verd: RTTL etc. should begin at zero again
- probably secmark.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [PATCH 1/4] veth: move loopback logic to common location

2009-11-26 Thread Arnd Bergmann

On Tuesday 24 November 2009, Patrick McHardy wrote:
> Eric W. Biederman wrote:
> > I don't quite follow what you intend with dev_queue_xmit when the macvlan
> > is in one namespace and the real physical device is in another.  Are
> > you mentioning that the packet classifier runs in the namespace where
> > the primary device lives with packets from a different namespace?
> 
> Exactly. And I think we should make sure that the namespace of
> the macvlan device can't (deliberately or accidentally) cause
> misclassification.

This is independent of my series and a preexisting problem, right?

Which fields do you think need to be reset to maintain namespace
isolation for the outbound path in macvlan?
---
net: maintain namespace isolation between vlan and real device

In the vlan and macvlan drivers, the start_xmit function forwards
data to the dev_queue_xmit function for another device, which may
potentially belong to a different namespace.

To make sure that classification stays within a single namespace,
this resets the potentially critical fields.

Signed-off-by: Arnd Bergmann 

---
 drivers/net/macvlan.c |2 +-
 include/linux/netdevice.h |9 +
 net/8021q/vlan_dev.c  |2 +-
 net/core/dev.c|   26 ++
 4 files changed, 33 insertions(+), 6 deletions(-)

diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
index 322112c..edcebf1 100644
--- a/drivers/net/macvlan.c
+++ b/drivers/net/macvlan.c
@@ -269,7 +269,7 @@ static int macvlan_queue_xmit(struct sk_buff *skb, struct 
net_device *dev)
}
 
 xmit_world:
-   skb->dev = vlan->lowerdev;
+   skb_set_dev(skb, vlan->lowerdev);
return dev_queue_xmit(skb);
 }
 
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 9428793..fdf4a1a 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1009,6 +1009,15 @@ static inline bool netdev_uses_dsa_tags(struct 
net_device *dev)
return 0;
 }
 
+#ifdef CONFIG_NET_NS
+static inline void skb_set_dev(struct sk_buff *skb, struct net_device *dev)
+{
+   skb->dev = dev;
+}
+#else /* CONFIG_NET_NS */
+void skb_set_dev(struct sk_buff *skb, struct net_device *dev);
+#endif
+
 static inline bool netdev_uses_trailer_tags(struct net_device *dev)
 {
 #ifdef CONFIG_NET_DSA_TAG_TRAILER
diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
index de0dc6b..51fcfff 100644
--- a/net/8021q/vlan_dev.c
+++ b/net/8021q/vlan_dev.c
@@ -323,7 +323,7 @@ static netdev_tx_t vlan_dev_hard_start_xmit(struct sk_buff 
*skb,
}
 
 
-   skb->dev = vlan_dev_info(dev)->real_dev;
+   skb_set_dev(skb, vlan_dev_info(dev)->real_dev);
len = skb->len;
ret = dev_queue_xmit(skb);
 
diff --git a/net/core/dev.c b/net/core/dev.c
index f8baa15..7179b58 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1448,13 +1448,10 @@ int dev_forward_skb(struct net_device *dev, struct 
sk_buff *skb)
if (skb->len > (dev->mtu + dev->hard_header_len))
return NET_RX_DROP;
 
-   skb_dst_drop(skb);
+   skb_set_dev(skb, dev);
skb->tstamp.tv64 = 0;
skb->pkt_type = PACKET_HOST;
skb->protocol = eth_type_trans(skb, dev);
-   skb->mark = 0;
-   secpath_reset(skb);
-   nf_reset(skb);
return netif_rx(skb);
 }
 EXPORT_SYMBOL_GPL(dev_forward_skb);
@@ -1614,6 +1611,27 @@ static bool dev_can_checksum(struct net_device *dev, 
struct sk_buff *skb)
return false;
 }
 
+/**
+ * skb_dev_set -- assign a buffer to a new device
+ * @skb: buffer for the new device
+ * @dev: network device
+ *
+ * If an skb is owned by a device already, we have to reset
+ * all data private to the namespace a device belongs to
+ * before assigning it a new device.
+ */
+void skb_set_dev(struct sk_buff *skb, struct net_device *dev)
+{
+   if (skb->dev && !net_eq(dev_net(skb->dev), dev_net(dev))) {
+   secpath_reset(skb);
+   skb_dst_drop(skb);
+   nf_reset(skb);
+   skb->mark = 0;
+   }
+   skb->dev = dev;
+}
+EXPORT_SYMBOL(skb_set_dev);
+
 /*
  * Invalidate hardware checksum when packet is to be mangled, and
  * complete checksum manually on outgoing path.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [PATCHv3 0/4] macvlan: add vepa and bridge mode

Re: [PATCH 1/4] veth: move loopback logic to common location

Re: [PATCH 1/4] veth: move loopback logic to common location

Re: [PATCH 1/4] veth: move loopback logic to common location

Re: [PATCHv3 0/4] macvlan: add vepa and bridge mode

[PATCHv3 2/4] macvlan: cleanup rx statistics

[PATCHv3 1/4] veth: move loopback logic to common location

[PATCHv3 4/4] macvlan: export macvlan mode through netlink

[PATCHv3 3/4] macvlan: implement bridge, VEPA and private mode

[PATCHv3 0/4] macvlan: add vepa and bridge mode

Re: [PATCH 1/4] veth: move loopback logic to common location

Re: [PATCH 1/4] veth: move loopback logic to common location

12 matches

Site Navigation

Mail list logo

Footer information