Re: Moving frags and SKBTX_DEV_ZEROCOPY skbs

2014-05-15 Thread Zoltan Kiss

On 14/05/14 20:41, Zoltan Kiss wrote:

But here is the thing: deliver_skb calls orphan_frags for every packet
delivered to the local stack, so we are safe IF these functions are
called before the IP stack. So we are safe now, but things can go wrong,
if:
- such a frag-mangling function is called before deliver_skb, now or in
the future
- if someone wants to take advantage of zerocopy in the guest<->backend
path


Running through the code I've found the following core functions can 
shuffle frags between skbs (and don't handle zerocopy skbs already):

skb_gro_receive
skb_shift
skb_split

None of them can meet at the moment with zerocopy skbs, but it's better 
to keep it in mind for the future, that would blow up these kind of skbs.


Zoli
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Moving frags and SKBTX_DEV_ZEROCOPY skbs

2014-05-14 Thread Zoltan Kiss

On 14/05/14 15:23, Eric Dumazet wrote:

On Wed, 2014-05-14 at 14:40 +0100, Zoltan Kiss wrote:

Hi,

Recently I've investigated issues around SKBTX_DEV_ZEROCOPY skbs where
the frags list were modified. I came across this function skb_shift(),
which moves frags between skbs. And there are a lot more of such kind,
skb_split or skb_try_coalesce, for example.
It could be a dangerous thing if a frag is referenced from an skb which
doesn't have the original destructor_arg, and to avoid that
skb_orphan_frags should be called. Although probably these functions are
not normally touched in usual usecases, I think it would be useful to
review core skb functions proactively and add an skb_orphan_frags
everywhere where the frags could be referenced from other places.
Any opinion about this?



For skb_shift(), it is currently used from tcp stack only, where
this SKBTX_DEV_ZEROCOPY thing is not used, so I do not think there is a
bug for the moment.
It is called from tcp_input.c, which suggests it can be called on 
incoming TCP packets. If the backend domain communicates with the 
frontend through sockets, zerocopy packets can turn up here.
But here is the thing: deliver_skb calls orphan_frags for every packet 
delivered to the local stack, so we are safe IF these functions are 
called before the IP stack. So we are safe now, but things can go wrong, if:
- such a frag-mangling function is called before deliver_skb, now or in 
the future

- if someone wants to take advantage of zerocopy in the guest<->backend path



I already gave a patch for skb_try_coalesce() : For this one we do not
wan skb_orphan_frags() overhead. Its simply better in this case to
abort.

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 1b62343f5837..85995a14aafc 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -3838,7 +3839,10 @@ bool skb_try_coalesce(struct sk_buff *to, struct sk_buff 
*from,
return true;
}

-   if (skb_has_frag_list(to) || skb_has_frag_list(from))
+   if (skb_has_frag_list(to) ||
+   skb_has_frag_list(from) ||
+   (skb_shinfo(to)->tx_flags & SKBTX_DEV_ZEROCOPY) ||
+   (skb_shinfo(from)->tx_flags & SKBTX_DEV_ZEROCOPY))
return false;

if (skb_headlen(from) != 0) {





--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Moving frags and SKBTX_DEV_ZEROCOPY skbs

2014-05-14 Thread Zoltan Kiss

Hi,

Recently I've investigated issues around SKBTX_DEV_ZEROCOPY skbs where 
the frags list were modified. I came across this function skb_shift(), 
which moves frags between skbs. And there are a lot more of such kind, 
skb_split or skb_try_coalesce, for example.
It could be a dangerous thing if a frag is referenced from an skb which 
doesn't have the original destructor_arg, and to avoid that 
skb_orphan_frags should be called. Although probably these functions are 
not normally touched in usual usecases, I think it would be useful to 
review core skb functions proactively and add an skb_orphan_frags 
everywhere where the frags could be referenced from other places.

Any opinion about this?

Regards,

Zoltan
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3.13] core, nfqueue, openvswitch: Orphan frags in skb_zerocopy and handle errors

2014-04-23 Thread Zoltan Kiss

On 22/04/14 20:18, Ben Hutchings wrote:

From: Zoltan Kiss 

commit 36d5fe6a000790f56039afe26834265db0a3ad4c upstream.

skb_zerocopy can copy elements of the frags array between skbs, but it doesn't
orphan them. Also, it doesn't handle errors, so this patch takes care of that
as well, and modify the callers accordingly. skb_tx_error() is also added to
the callers so they will signal the failed delivery towards the creator of the
skb.

Signed-off-by: Zoltan Kiss 
Signed-off-by: David S. Miller 
[bwh: Backported to 3.13: skb_zerocopy() is new in 3.14, but was moved from a
  static function in nfnetlink_queue.  We need to patch that and its caller, but
  not openvswitch.]
Signed-off-by: Ben Hutchings 
---
On Tue, 2014-04-22 at 19:02 +0100, Zoltan Kiss wrote:

On 22/04/14 16:38, Ben Hutchings wrote:

On Mon, 2014-04-21 at 12:26 +0100, Luis Henriques wrote:

Hi David,

On Thu, Mar 27, 2014 at 03:29:56PM -0400, David Miller wrote:

From: Zoltan Kiss 
Date: Wed, 26 Mar 2014 22:37:45 +


skb_zerocopy can copy elements of the frags array between skbs, but it doesn't
orphan them. Also, it doesn't handle errors, so this patch takes care of that
as well, and modify the callers accordingly. skb_tx_error() is also added to
the callers so they will signal the failed delivery towards the creator of the
skb.

Signed-off-by: Zoltan Kiss 


Fingers crossed :-)  Applied, thanks Zoltan.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Could you please queue this for stable as well?  It has CVE-2014-2568
assigned to it and I believe it is applicable to some of the trees.


It was applied to 3.13, but needs backporting to earlier versions.  I
posted my attempt in
<1397429860.10849.86.ca...@deadeye.wl.decadent.org.uk> but it needs
testing/reviewing.


This one is a different issue, although it is very similar.


Sorry for mixing these up.  I believe this bug has been present since
zerocopy was added to nfqueue in Linux 3.10 (commit ae08ce002108).  This
is the version I used for Debian's 3.13 branch, which might be usable
for older stable branches too.

Seems OK, thanks!



Ben.

---
--- a/net/netfilter/nfnetlink_queue_core.c
+++ b/net/netfilter/nfnetlink_queue_core.c
@@ -235,22 +235,23 @@ nfqnl_flush(struct nfqnl_instance *queue
spin_unlock_bh(&queue->lock);
  }

-static void
+static int
  nfqnl_zcopy(struct sk_buff *to, const struct sk_buff *from, int len, int hlen)
  {
int i, j = 0;
int plen = 0; /* length of skb->head fragment */
+   int ret;
struct page *page;
unsigned int offset;

/* dont bother with small payloads */
-   if (len <= skb_tailroom(to)) {
-   skb_copy_bits(from, 0, skb_put(to, len), len);
-   return;
-   }
+   if (len <= skb_tailroom(to))
+   return skb_copy_bits(from, 0, skb_put(to, len), len);

if (hlen) {
-   skb_copy_bits(from, 0, skb_put(to, hlen), hlen);
+   ret = skb_copy_bits(from, 0, skb_put(to, hlen), hlen);
+   if (unlikely(ret))
+   return ret;
len -= hlen;
} else {
plen = min_t(int, skb_headlen(from), len);
@@ -268,6 +269,11 @@ nfqnl_zcopy(struct sk_buff *to, const st
to->len += len + plen;
to->data_len += len + plen;

+   if (unlikely(skb_orphan_frags(from, GFP_ATOMIC))) {
+   skb_tx_error(from);
+   return -ENOMEM;
+   }
+
for (i = 0; i < skb_shinfo(from)->nr_frags; i++) {
if (!len)
break;
@@ -278,6 +284,8 @@ nfqnl_zcopy(struct sk_buff *to, const st
j++;
}
skb_shinfo(to)->nr_frags = j;
+
+   return 0;
  }

  static int
@@ -374,13 +382,16 @@ nfqnl_build_packet_message(struct net *n

skb = nfnetlink_alloc_skb(net, size, queue->peer_portid,
  GFP_ATOMIC);
-   if (!skb)
+   if (!skb) {
+   skb_tx_error(entskb);
return NULL;
+   }

nlh = nlmsg_put(skb, 0, 0,
NFNL_SUBSYS_QUEUE << 8 | NFQNL_MSG_PACKET,
sizeof(struct nfgenmsg), 0);
if (!nlh) {
+   skb_tx_error(entskb);
kfree_skb(skb);
return NULL;
}
@@ -504,13 +515,15 @@ nfqnl_build_packet_message(struct net *n
nla->nla_type = NFQA_PAYLOAD;
nla->nla_len = nla_attr_size(data_len);

-   nfqnl_zcopy(skb, entskb, data_len, hlen);
+   if (nfqnl_zcopy(skb, entskb, data_len, hlen))
+   goto nla_put_failure;
}

nlh->nlmsg_len = skb->len;
return skb;

  nla_put_failure:
+   skb_tx_error(entskb);
 

Re: [PATCH v5] core, nfqueue, openvswitch: Orphan frags in skb_zerocopy and handle errors

2014-04-22 Thread Zoltan Kiss

On 22/04/14 16:38, Ben Hutchings wrote:

On Mon, 2014-04-21 at 12:26 +0100, Luis Henriques wrote:

Hi David,

On Thu, Mar 27, 2014 at 03:29:56PM -0400, David Miller wrote:

From: Zoltan Kiss 
Date: Wed, 26 Mar 2014 22:37:45 +


skb_zerocopy can copy elements of the frags array between skbs, but it doesn't
orphan them. Also, it doesn't handle errors, so this patch takes care of that
as well, and modify the callers accordingly. skb_tx_error() is also added to
the callers so they will signal the failed delivery towards the creator of the
skb.

Signed-off-by: Zoltan Kiss 


Fingers crossed :-)  Applied, thanks Zoltan.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Could you please queue this for stable as well?  It has CVE-2014-2568
assigned to it and I believe it is applicable to some of the trees.


It was applied to 3.13, but needs backporting to earlier versions.  I
posted my attempt in
<1397429860.10849.86.ca...@deadeye.wl.decadent.org.uk> but it needs
testing/reviewing.


This one is a different issue, although it is very similar.

Zoli

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4] core, nfqueue, openvswitch: Orphan frags in skb_zerocopy and handle errors

2014-03-26 Thread Zoltan Kiss

On 26/03/14 20:12, David Miller wrote:

From: David Miller 
Date: Wed, 26 Mar 2014 15:59:58 -0400 (EDT)


From: Zoltan Kiss 
Date: Fri, 21 Mar 2014 10:31:34 +


skb_zerocopy can copy elements of the frags array between skbs, but it doesn't
orphan them. Also, it doesn't handle errors, so this patch takes care of that
as well, and modify the callers accordingly. skb_tx_error() is also added to
the callers so they will signal the failed delivery towards the creator of the
skb.

Signed-off-by: Zoltan Kiss 


Applied, thanks Zoltan.


Actually, Zoltan, you have to fix this:

net/core/skbuff.c: In function ‘skb_zerocopy’:
net/core/skbuff.c:2172:2: warning: passing argument 1 of ‘skb_orphan_frags’ 
discards ‘const’ qualifi
er from pointer target type [enabled by default]
In file included from include/linux/tcp.h:21:0,
  from net/core/skbuff.c:50:
include/linux/skbuff.h:1904:19: note: expected ‘struct sk_buff *’ but argument 
is of type ‘const str
uct sk_buff *’
net/core/skbuff.c:2173:3: warning: passing argument 1 of ‘skb_tx_error’ 
discards ‘const’ qualifier f
rom pointer target type [enabled by default]
net/core/skbuff.c:642:6: note: expected ‘struct sk_buff *’ but argument is of 
type ‘const struct sk_
buff *’



Ok, resubmitted. 'from' is now not a const parameter, because 
skb->pfmemalloc might change.


Zoli
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5] core, nfqueue, openvswitch: Orphan frags in skb_zerocopy and handle errors

2014-03-26 Thread Zoltan Kiss
skb_zerocopy can copy elements of the frags array between skbs, but it doesn't
orphan them. Also, it doesn't handle errors, so this patch takes care of that
as well, and modify the callers accordingly. skb_tx_error() is also added to
the callers so they will signal the failed delivery towards the creator of the
skb.

Signed-off-by: Zoltan Kiss 
---
v2: orphan the frags right before touching the frags

v3:
- orphan 'from' instead of 'to'
- call skb_tx_error() in the callers if something went wrong

v4: correctly use error path in queue_userspace_packet

v5: remove const qualifier of "from" in skb_zerocopy parameters

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 03db95a..0db91fa 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -2508,8 +2508,8 @@ int skb_splice_bits(struct sk_buff *skb, unsigned int 
offset,
unsigned int flags);
 void skb_copy_and_csum_dev(const struct sk_buff *skb, u8 *to);
 unsigned int skb_zerocopy_headlen(const struct sk_buff *from);
-void skb_zerocopy(struct sk_buff *to, const struct sk_buff *from,
- int len, int hlen);
+int skb_zerocopy(struct sk_buff *to, struct sk_buff *from,
+int len, int hlen);
 void skb_split(struct sk_buff *skb, struct sk_buff *skb1, const u32 len);
 int skb_shift(struct sk_buff *tgt, struct sk_buff *skb, int shiftlen);
 void skb_scrub_packet(struct sk_buff *skb, bool xnet);
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 3f14c63..908ad98 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -2127,25 +2127,31 @@ EXPORT_SYMBOL_GPL(skb_zerocopy_headlen);
  *
  * The `hlen` as calculated by skb_zerocopy_headlen() specifies the
  * headroom in the `to` buffer.
+ *
+ * Return value:
+ * 0: everything is OK
+ * -ENOMEM: couldn't orphan frags of @from due to lack of memory
+ * -EFAULT: skb_copy_bits() found some problem with skb geometry
  */
-void
-skb_zerocopy(struct sk_buff *to, const struct sk_buff *from, int len, int hlen)
+int
+skb_zerocopy(struct sk_buff *to, struct sk_buff *from, int len, int hlen)
 {
int i, j = 0;
int plen = 0; /* length of skb->head fragment */
+   int ret;
struct page *page;
unsigned int offset;
 
BUG_ON(!from->head_frag && !hlen);
 
/* dont bother with small payloads */
-   if (len <= skb_tailroom(to)) {
-   skb_copy_bits(from, 0, skb_put(to, len), len);
-   return;
-   }
+   if (len <= skb_tailroom(to))
+   return skb_copy_bits(from, 0, skb_put(to, len), len);
 
if (hlen) {
-   skb_copy_bits(from, 0, skb_put(to, hlen), hlen);
+   ret = skb_copy_bits(from, 0, skb_put(to, hlen), hlen);
+   if (unlikely(ret))
+   return ret;
len -= hlen;
} else {
plen = min_t(int, skb_headlen(from), len);
@@ -2163,6 +2169,11 @@ skb_zerocopy(struct sk_buff *to, const struct sk_buff 
*from, int len, int hlen)
to->len += len + plen;
to->data_len += len + plen;
 
+   if (unlikely(skb_orphan_frags(from, GFP_ATOMIC))) {
+   skb_tx_error(from);
+   return -ENOMEM;
+   }
+
for (i = 0; i < skb_shinfo(from)->nr_frags; i++) {
if (!len)
break;
@@ -2173,6 +2184,8 @@ skb_zerocopy(struct sk_buff *to, const struct sk_buff 
*from, int len, int hlen)
j++;
}
skb_shinfo(to)->nr_frags = j;
+
+   return 0;
 }
 EXPORT_SYMBOL_GPL(skb_zerocopy);
 
diff --git a/net/netfilter/nfnetlink_queue_core.c 
b/net/netfilter/nfnetlink_queue_core.c
index f072fe8..108120f 100644
--- a/net/netfilter/nfnetlink_queue_core.c
+++ b/net/netfilter/nfnetlink_queue_core.c
@@ -354,13 +354,16 @@ nfqnl_build_packet_message(struct net *net, struct 
nfqnl_instance *queue,
 
skb = nfnetlink_alloc_skb(net, size, queue->peer_portid,
  GFP_ATOMIC);
-   if (!skb)
+   if (!skb) {
+   skb_tx_error(entskb);
return NULL;
+   }
 
nlh = nlmsg_put(skb, 0, 0,
NFNL_SUBSYS_QUEUE << 8 | NFQNL_MSG_PACKET,
sizeof(struct nfgenmsg), 0);
if (!nlh) {
+   skb_tx_error(entskb);
kfree_skb(skb);
return NULL;
}
@@ -488,13 +491,15 @@ nfqnl_build_packet_message(struct net *net, struct 
nfqnl_instance *queue,
nla->nla_type = NFQA_PAYLOAD;
nla->nla_len = nla_attr_size(data_len);
 
-   skb_zerocopy(skb, entskb, data_len, hlen);
+   if (skb_zerocopy(skb, entskb, data_len, hlen))
+   goto nla_put_failure;
}
 
nlh->nlmsg_len = skb->len;
return skb;
 
 nla_put_failure:
+   skb_tx_error(en

Re: [PATCH v3] core, nfqueue, openvswitch: Orphan frags in skb_zerocopy and handle errors

2014-03-21 Thread Zoltan Kiss

On 20/03/14 22:22, Thomas Graf wrote:

On 03/20/2014 05:02 PM, Zoltan Kiss wrote:

--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -464,7 +464,9 @@ static int queue_userspace_packet(struct datapath
*dp, struct sk_buff *skb,
  }
  nla->nla_len = nla_attr_size(skb->len);

-skb_zerocopy(user_skb, skb, skb->len, hlen);
+err = skb_zerocopy(user_skb, skb, skb->len, hlen);
+if (err)
+goto out;

  /* Pad OVS_PACKET_ATTR_PACKET if linear copy was performed */
  if (!(dp->user_features & OVS_DP_F_UNALIGNED)) {
@@ -478,6 +480,7 @@ static int queue_userspace_packet(struct datapath
*dp, struct sk_buff *skb,

  err = genlmsg_unicast(ovs_dp_get_net(dp), user_skb,
upcall_info->portid);
  out:
+skb_tx_error(skb);


Really sorry to bug you again. We'll get this right ;-)

Patch looks great except for this call to skb_tx_error() which is now
done in the error *and* success path. Make the call conditional on
if (err) and we're good.

Ah, sorry, I'm really sloppy with this patch. I've sent a fixed one.

Zoli
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4] core, nfqueue, openvswitch: Orphan frags in skb_zerocopy and handle errors

2014-03-21 Thread Zoltan Kiss
skb_zerocopy can copy elements of the frags array between skbs, but it doesn't
orphan them. Also, it doesn't handle errors, so this patch takes care of that
as well, and modify the callers accordingly. skb_tx_error() is also added to
the callers so they will signal the failed delivery towards the creator of the
skb.

Signed-off-by: Zoltan Kiss 
---
v2: orphan the frags right before touching the frags

v3:
- orphan 'from' instead of 'to'
- call skb_tx_error() in the callers if something went wrong

v4: correctly use error path in queue_userspace_packet

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 03db95a..35c4e85 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -2508,8 +2508,8 @@ int skb_splice_bits(struct sk_buff *skb, unsigned int 
offset,
unsigned int flags);
 void skb_copy_and_csum_dev(const struct sk_buff *skb, u8 *to);
 unsigned int skb_zerocopy_headlen(const struct sk_buff *from);
-void skb_zerocopy(struct sk_buff *to, const struct sk_buff *from,
- int len, int hlen);
+int skb_zerocopy(struct sk_buff *to, const struct sk_buff *from,
+int len, int hlen);
 void skb_split(struct sk_buff *skb, struct sk_buff *skb1, const u32 len);
 int skb_shift(struct sk_buff *tgt, struct sk_buff *skb, int shiftlen);
 void skb_scrub_packet(struct sk_buff *skb, bool xnet);
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 3f14c63..4cf0ee5 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -2127,25 +2127,31 @@ EXPORT_SYMBOL_GPL(skb_zerocopy_headlen);
  *
  * The `hlen` as calculated by skb_zerocopy_headlen() specifies the
  * headroom in the `to` buffer.
+ *
+ * Return value:
+ * 0: everything is OK
+ * -ENOMEM: couldn't orphan frags of @from due to lack of memory
+ * -EFAULT: skb_copy_bits() found some problem with skb geometry
  */
-void
+int
 skb_zerocopy(struct sk_buff *to, const struct sk_buff *from, int len, int hlen)
 {
int i, j = 0;
int plen = 0; /* length of skb->head fragment */
+   int ret;
struct page *page;
unsigned int offset;
 
BUG_ON(!from->head_frag && !hlen);
 
/* dont bother with small payloads */
-   if (len <= skb_tailroom(to)) {
-   skb_copy_bits(from, 0, skb_put(to, len), len);
-   return;
-   }
+   if (len <= skb_tailroom(to))
+   return skb_copy_bits(from, 0, skb_put(to, len), len);
 
if (hlen) {
-   skb_copy_bits(from, 0, skb_put(to, hlen), hlen);
+   ret = skb_copy_bits(from, 0, skb_put(to, hlen), hlen);
+   if (unlikely(ret))
+   return ret;
len -= hlen;
} else {
plen = min_t(int, skb_headlen(from), len);
@@ -2163,6 +2169,11 @@ skb_zerocopy(struct sk_buff *to, const struct sk_buff 
*from, int len, int hlen)
to->len += len + plen;
to->data_len += len + plen;
 
+   if (unlikely(skb_orphan_frags(from, GFP_ATOMIC))) {
+   skb_tx_error(from);
+   return -ENOMEM;
+   }
+
for (i = 0; i < skb_shinfo(from)->nr_frags; i++) {
if (!len)
break;
@@ -2173,6 +2184,8 @@ skb_zerocopy(struct sk_buff *to, const struct sk_buff 
*from, int len, int hlen)
j++;
}
skb_shinfo(to)->nr_frags = j;
+
+   return 0;
 }
 EXPORT_SYMBOL_GPL(skb_zerocopy);
 
diff --git a/net/netfilter/nfnetlink_queue_core.c 
b/net/netfilter/nfnetlink_queue_core.c
index f072fe8..108120f 100644
--- a/net/netfilter/nfnetlink_queue_core.c
+++ b/net/netfilter/nfnetlink_queue_core.c
@@ -354,13 +354,16 @@ nfqnl_build_packet_message(struct net *net, struct 
nfqnl_instance *queue,
 
skb = nfnetlink_alloc_skb(net, size, queue->peer_portid,
  GFP_ATOMIC);
-   if (!skb)
+   if (!skb) {
+   skb_tx_error(entskb);
return NULL;
+   }
 
nlh = nlmsg_put(skb, 0, 0,
NFNL_SUBSYS_QUEUE << 8 | NFQNL_MSG_PACKET,
sizeof(struct nfgenmsg), 0);
if (!nlh) {
+   skb_tx_error(entskb);
kfree_skb(skb);
return NULL;
}
@@ -488,13 +491,15 @@ nfqnl_build_packet_message(struct net *net, struct 
nfqnl_instance *queue,
nla->nla_type = NFQA_PAYLOAD;
nla->nla_len = nla_attr_size(data_len);
 
-   skb_zerocopy(skb, entskb, data_len, hlen);
+   if (skb_zerocopy(skb, entskb, data_len, hlen))
+   goto nla_put_failure;
}
 
nlh->nlmsg_len = skb->len;
return skb;
 
 nla_put_failure:
+   skb_tx_error(entskb);
kfree_skb(skb);
net_err_ratelimited("nf_queue: error creating packet message\n");
return NULL;
diff --

[PATCH v3] core, nfqueue, openvswitch: Orphan frags in skb_zerocopy and handle errors

2014-03-20 Thread Zoltan Kiss
skb_zerocopy can copy elements of the frags array between skbs, but it doesn't
orphan them. Also, it doesn't handle errors, so this patch takes care of that
as well, and modify the callers accordingly. skb_tx_error() is also added to
the callers so they will signal the failed delivery towards the creator of the
skb.

Signed-off-by: Zoltan Kiss 
---
v2: orphan the frags right before touching the frags

v3:
- orphan 'from' instead of 'to'
- call skb_tx_error() in the callers if something went wrong

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 03db95a..35c4e85 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -2508,8 +2508,8 @@ int skb_splice_bits(struct sk_buff *skb, unsigned int 
offset,
unsigned int flags);
 void skb_copy_and_csum_dev(const struct sk_buff *skb, u8 *to);
 unsigned int skb_zerocopy_headlen(const struct sk_buff *from);
-void skb_zerocopy(struct sk_buff *to, const struct sk_buff *from,
- int len, int hlen);
+int skb_zerocopy(struct sk_buff *to, const struct sk_buff *from,
+int len, int hlen);
 void skb_split(struct sk_buff *skb, struct sk_buff *skb1, const u32 len);
 int skb_shift(struct sk_buff *tgt, struct sk_buff *skb, int shiftlen);
 void skb_scrub_packet(struct sk_buff *skb, bool xnet);
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 3f14c63..73f0291 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -2127,25 +2127,31 @@ EXPORT_SYMBOL_GPL(skb_zerocopy_headlen);
  *
  * The `hlen` as calculated by skb_zerocopy_headlen() specifies the
  * headroom in the `to` buffer.
+ *
+ * Return value:
+ * 0: everything is OK
+ * -ENOMEM: couldn't orphan frags of from due to lack of memory
+ * -EFAULT: skb_copy_bits() found some problem with geometry
  */
-void
+int
 skb_zerocopy(struct sk_buff *to, const struct sk_buff *from, int len, int hlen)
 {
int i, j = 0;
int plen = 0; /* length of skb->head fragment */
+   int ret;
struct page *page;
unsigned int offset;
 
BUG_ON(!from->head_frag && !hlen);
 
/* dont bother with small payloads */
-   if (len <= skb_tailroom(to)) {
-   skb_copy_bits(from, 0, skb_put(to, len), len);
-   return;
-   }
+   if (len <= skb_tailroom(to))
+   return skb_copy_bits(from, 0, skb_put(to, len), len);
 
if (hlen) {
-   skb_copy_bits(from, 0, skb_put(to, hlen), hlen);
+   ret = skb_copy_bits(from, 0, skb_put(to, hlen), hlen);
+   if (unlikely(ret))
+   return ret;
len -= hlen;
} else {
plen = min_t(int, skb_headlen(from), len);
@@ -2163,6 +2169,11 @@ skb_zerocopy(struct sk_buff *to, const struct sk_buff 
*from, int len, int hlen)
to->len += len + plen;
to->data_len += len + plen;
 
+   if (unlikely(skb_orphan_frags(from, GFP_ATOMIC))) {
+   skb_tx_error(from);
+   return -ENOMEM;
+   }
+
for (i = 0; i < skb_shinfo(from)->nr_frags; i++) {
if (!len)
break;
@@ -2173,6 +2184,8 @@ skb_zerocopy(struct sk_buff *to, const struct sk_buff 
*from, int len, int hlen)
j++;
}
skb_shinfo(to)->nr_frags = j;
+
+   return 0;
 }
 EXPORT_SYMBOL_GPL(skb_zerocopy);
 
diff --git a/net/netfilter/nfnetlink_queue_core.c 
b/net/netfilter/nfnetlink_queue_core.c
index f072fe8..108120f 100644
--- a/net/netfilter/nfnetlink_queue_core.c
+++ b/net/netfilter/nfnetlink_queue_core.c
@@ -354,13 +354,16 @@ nfqnl_build_packet_message(struct net *net, struct 
nfqnl_instance *queue,
 
skb = nfnetlink_alloc_skb(net, size, queue->peer_portid,
  GFP_ATOMIC);
-   if (!skb)
+   if (!skb) {
+   skb_tx_error(entskb);
return NULL;
+   }
 
nlh = nlmsg_put(skb, 0, 0,
NFNL_SUBSYS_QUEUE << 8 | NFQNL_MSG_PACKET,
sizeof(struct nfgenmsg), 0);
if (!nlh) {
+   skb_tx_error(entskb);
kfree_skb(skb);
return NULL;
}
@@ -488,13 +491,15 @@ nfqnl_build_packet_message(struct net *net, struct 
nfqnl_instance *queue,
nla->nla_type = NFQA_PAYLOAD;
nla->nla_len = nla_attr_size(data_len);
 
-   skb_zerocopy(skb, entskb, data_len, hlen);
+   if (skb_zerocopy(skb, entskb, data_len, hlen))
+   goto nla_put_failure;
}
 
nlh->nlmsg_len = skb->len;
return skb;
 
 nla_put_failure:
+   skb_tx_error(entskb);
kfree_skb(skb);
net_err_ratelimited("nf_queue: error creating packet message\n");
return NULL;
diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
inde

Re: [PATCH v2] core, nfqueue, openvswitch: Orphan frags in skb_zerocopy and handle errors

2014-03-20 Thread Zoltan Kiss

On 20/03/14 12:33, Thomas Graf wrote:

On 03/20/2014 01:16 PM, Thomas Graf wrote:

On 03/19/2014 10:07 PM, Zoltan Kiss wrote:

skb_zerocopy can copy elements of the frags array between skbs, but it
doesn't
orphan them. Also, it doesn't handle errors, so this patch takes care
of that
as well.

Signed-off-by: Zoltan Kiss 


Acked-by: Thomas Graf 


I take this back ;)


---
+if (unlikely(skb_orphan_frags(to, GFP_ATOMIC))) {
+skb_tx_error(to);
+return -ENOMEM;
+}


Just noticed that you orphan the Netlink skb frags which do not
exist yet instead of the source skb frags.

Oh, sorry, I'll fix this up.




Did you consider calling skb_tx_error() for Netlink message
allocation failures for the upcall as well? That memory pressure
is currently not reported back.
Yeah, that makes sense, I'll fix it the callers. Btw. I guess that just 
provides some statistical data for the creator of the skb. At least for 
netback it only increment a stat counter.


Zoli


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] core, nfqueue, openvswitch: Orphan frags in skb_zerocopy and handle errors

2014-03-19 Thread Zoltan Kiss

On 19/03/14 20:47, Thomas Graf wrote:

On 03/19/2014 09:38 PM, Zoltan Kiss wrote:

skb_zerocopy can copy elements of the frags array between skbs, but it
doesn't
orphan them. Also, it doesn't handle errors, so this patch takes care
of that
as well.

Signed-off-by: Zoltan Kiss 
---



+if (unlikely(skb_orphan_frags(to, GFP_ATOMIC))) {
+skb_tx_error(to);
+return -ENOMEM;
  }


I think you should move this down to right before we iterate over the
frags.


I agree, I was mislead by that __skb_fill_page_desc, but now I see it is 
actually the head buffer of the source, not a frag from there. And in 
case of a failure in skb_copy_bits we can spare an orphan_frag.


Zoli

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] core, nfqueue, openvswitch: Orphan frags in skb_zerocopy and handle errors

2014-03-19 Thread Zoltan Kiss
skb_zerocopy can copy elements of the frags array between skbs, but it doesn't
orphan them. Also, it doesn't handle errors, so this patch takes care of that
as well.

Signed-off-by: Zoltan Kiss 
---
v2: orphan the frags right before touching the frags

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 03db95a..35c4e85 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -2508,8 +2508,8 @@ int skb_splice_bits(struct sk_buff *skb, unsigned int 
offset,
unsigned int flags);
 void skb_copy_and_csum_dev(const struct sk_buff *skb, u8 *to);
 unsigned int skb_zerocopy_headlen(const struct sk_buff *from);
-void skb_zerocopy(struct sk_buff *to, const struct sk_buff *from,
- int len, int hlen);
+int skb_zerocopy(struct sk_buff *to, const struct sk_buff *from,
+int len, int hlen);
 void skb_split(struct sk_buff *skb, struct sk_buff *skb1, const u32 len);
 int skb_shift(struct sk_buff *tgt, struct sk_buff *skb, int shiftlen);
 void skb_scrub_packet(struct sk_buff *skb, bool xnet);
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 3f14c63..5cc99b1 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -2127,25 +2127,31 @@ EXPORT_SYMBOL_GPL(skb_zerocopy_headlen);
  *
  * The `hlen` as calculated by skb_zerocopy_headlen() specifies the
  * headroom in the `to` buffer.
+ *
+ * Return value:
+ * 0: everything is OK
+ * -ENOMEM: couldn't orphan frags of from due to lack of memory
+ * -EFAULT: skb_copy_bits() found some problem with geometry
  */
-void
+int
 skb_zerocopy(struct sk_buff *to, const struct sk_buff *from, int len, int hlen)
 {
int i, j = 0;
int plen = 0; /* length of skb->head fragment */
+   int ret;
struct page *page;
unsigned int offset;
 
BUG_ON(!from->head_frag && !hlen);
 
/* dont bother with small payloads */
-   if (len <= skb_tailroom(to)) {
-   skb_copy_bits(from, 0, skb_put(to, len), len);
-   return;
-   }
+   if (len <= skb_tailroom(to))
+   return skb_copy_bits(from, 0, skb_put(to, len), len);
 
if (hlen) {
-   skb_copy_bits(from, 0, skb_put(to, hlen), hlen);
+   ret = skb_copy_bits(from, 0, skb_put(to, hlen), hlen);
+   if (unlikely(ret))
+   return ret;
len -= hlen;
} else {
plen = min_t(int, skb_headlen(from), len);
@@ -2163,6 +2169,11 @@ skb_zerocopy(struct sk_buff *to, const struct sk_buff 
*from, int len, int hlen)
to->len += len + plen;
to->data_len += len + plen;
 
+   if (unlikely(skb_orphan_frags(to, GFP_ATOMIC))) {
+   skb_tx_error(to);
+   return -ENOMEM;
+   }
+
for (i = 0; i < skb_shinfo(from)->nr_frags; i++) {
if (!len)
break;
@@ -2173,6 +2184,8 @@ skb_zerocopy(struct sk_buff *to, const struct sk_buff 
*from, int len, int hlen)
j++;
}
skb_shinfo(to)->nr_frags = j;
+
+   return 0;
 }
 EXPORT_SYMBOL_GPL(skb_zerocopy);
 
diff --git a/net/netfilter/nfnetlink_queue_core.c 
b/net/netfilter/nfnetlink_queue_core.c
index f072fe8..ea02d16 100644
--- a/net/netfilter/nfnetlink_queue_core.c
+++ b/net/netfilter/nfnetlink_queue_core.c
@@ -488,7 +488,8 @@ nfqnl_build_packet_message(struct net *net, struct 
nfqnl_instance *queue,
nla->nla_type = NFQA_PAYLOAD;
nla->nla_len = nla_attr_size(data_len);
 
-   skb_zerocopy(skb, entskb, data_len, hlen);
+   if (skb_zerocopy(skb, entskb, data_len, hlen))
+   goto nla_put_failure;
}
 
nlh->nlmsg_len = skb->len;
diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index c53fe0c..9a9f668 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -464,7 +464,9 @@ static int queue_userspace_packet(struct datapath *dp, 
struct sk_buff *skb,
}
nla->nla_len = nla_attr_size(skb->len);
 
-   skb_zerocopy(user_skb, skb, skb->len, hlen);
+   err = skb_zerocopy(user_skb, skb, skb->len, hlen);
+   if (err)
+   goto out;
 
/* Pad OVS_PACKET_ATTR_PACKET if linear copy was performed */
if (!(dp->user_features & OVS_DP_F_UNALIGNED)) {
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] core, nfqueue, openvswitch: Orphan frags in skb_zerocopy and handle errors

2014-03-19 Thread Zoltan Kiss

On 19/03/14 20:24, David Miller wrote:

From: Zoltan Kiss 
Date: Tue, 18 Mar 2014 21:17:35 +


skb_zerocopy can copy elements of the frags array between skbs, but it doesn't
orphan them. Also, it doesn't handle errors, so this patch takes care of that
as well.

Signed-off-by: Zoltan Kiss 
---
  net/openvswitch/datapath.c |6 ++


Something is seriously wrong with the diffstat generated for
your submission, please fix this and resubmit.


Sorry, I've turned off diffstat in guilt a while ago, but this patch had 
an old version of it. I've resent it.


Zoli

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] core, nfqueue, openvswitch: Orphan frags in skb_zerocopy and handle errors

2014-03-19 Thread Zoltan Kiss
skb_zerocopy can copy elements of the frags array between skbs, but it doesn't
orphan them. Also, it doesn't handle errors, so this patch takes care of that
as well.

Signed-off-by: Zoltan Kiss 
---
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 03db95a..35c4e85 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -2508,8 +2508,8 @@ int skb_splice_bits(struct sk_buff *skb, unsigned int 
offset,
unsigned int flags);
 void skb_copy_and_csum_dev(const struct sk_buff *skb, u8 *to);
 unsigned int skb_zerocopy_headlen(const struct sk_buff *from);
-void skb_zerocopy(struct sk_buff *to, const struct sk_buff *from,
- int len, int hlen);
+int skb_zerocopy(struct sk_buff *to, const struct sk_buff *from,
+int len, int hlen);
 void skb_split(struct sk_buff *skb, struct sk_buff *skb1, const u32 len);
 int skb_shift(struct sk_buff *tgt, struct sk_buff *skb, int shiftlen);
 void skb_scrub_packet(struct sk_buff *skb, bool xnet);
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 3f14c63..60f623c 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -2127,25 +2127,36 @@ EXPORT_SYMBOL_GPL(skb_zerocopy_headlen);
  *
  * The `hlen` as calculated by skb_zerocopy_headlen() specifies the
  * headroom in the `to` buffer.
+ *
+ * Return value:
+ * 0: everything is OK
+ * -ENOMEM: couldn't orphan frags of from due to lack of memory
+ * -EFAULT: skb_copy_bits() found some problem with geometry
  */
-void
+int
 skb_zerocopy(struct sk_buff *to, const struct sk_buff *from, int len, int hlen)
 {
int i, j = 0;
int plen = 0; /* length of skb->head fragment */
+   int ret;
struct page *page;
unsigned int offset;
 
BUG_ON(!from->head_frag && !hlen);
 
/* dont bother with small payloads */
-   if (len <= skb_tailroom(to)) {
-   skb_copy_bits(from, 0, skb_put(to, len), len);
-   return;
+   if (len <= skb_tailroom(to))
+   return skb_copy_bits(from, 0, skb_put(to, len), len);
+
+   if (unlikely(skb_orphan_frags(to, GFP_ATOMIC))) {
+   skb_tx_error(to);
+   return -ENOMEM;
}
 
if (hlen) {
-   skb_copy_bits(from, 0, skb_put(to, hlen), hlen);
+   ret = skb_copy_bits(from, 0, skb_put(to, hlen), hlen);
+   if (unlikely(ret))
+   return ret;
len -= hlen;
} else {
plen = min_t(int, skb_headlen(from), len);
@@ -2173,6 +2184,8 @@ skb_zerocopy(struct sk_buff *to, const struct sk_buff 
*from, int len, int hlen)
j++;
}
skb_shinfo(to)->nr_frags = j;
+
+   return 0;
 }
 EXPORT_SYMBOL_GPL(skb_zerocopy);
 
diff --git a/net/netfilter/nfnetlink_queue_core.c 
b/net/netfilter/nfnetlink_queue_core.c
index f072fe8..ea02d16 100644
--- a/net/netfilter/nfnetlink_queue_core.c
+++ b/net/netfilter/nfnetlink_queue_core.c
@@ -488,7 +488,8 @@ nfqnl_build_packet_message(struct net *net, struct 
nfqnl_instance *queue,
nla->nla_type = NFQA_PAYLOAD;
nla->nla_len = nla_attr_size(data_len);
 
-   skb_zerocopy(skb, entskb, data_len, hlen);
+   if (skb_zerocopy(skb, entskb, data_len, hlen))
+   goto nla_put_failure;
}
 
nlh->nlmsg_len = skb->len;
diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index c53fe0c..9a9f668 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -464,7 +464,9 @@ static int queue_userspace_packet(struct datapath *dp, 
struct sk_buff *skb,
}
nla->nla_len = nla_attr_size(skb->len);
 
-   skb_zerocopy(user_skb, skb, skb->len, hlen);
+   err = skb_zerocopy(user_skb, skb, skb->len, hlen);
+   if (err)
+   goto out;
 
/* Pad OVS_PACKET_ATTR_PACKET if linear copy was performed */
if (!(dp->user_features & OVS_DP_F_UNALIGNED)) {
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] core, nfqueue, openvswitch: Orphan frags in skb_zerocopy and handle errors

2014-03-18 Thread Zoltan Kiss
skb_zerocopy can copy elements of the frags array between skbs, but it doesn't
orphan them. Also, it doesn't handle errors, so this patch takes care of that
as well.

Signed-off-by: Zoltan Kiss 
---
 net/openvswitch/datapath.c |6 ++
 1 file changed, 6 insertions(+)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 03db95a..35c4e85 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -2508,8 +2508,8 @@ int skb_splice_bits(struct sk_buff *skb, unsigned int 
offset,
unsigned int flags);
 void skb_copy_and_csum_dev(const struct sk_buff *skb, u8 *to);
 unsigned int skb_zerocopy_headlen(const struct sk_buff *from);
-void skb_zerocopy(struct sk_buff *to, const struct sk_buff *from,
- int len, int hlen);
+int skb_zerocopy(struct sk_buff *to, const struct sk_buff *from,
+int len, int hlen);
 void skb_split(struct sk_buff *skb, struct sk_buff *skb1, const u32 len);
 int skb_shift(struct sk_buff *tgt, struct sk_buff *skb, int shiftlen);
 void skb_scrub_packet(struct sk_buff *skb, bool xnet);
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 3f14c63..60f623c 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -2127,25 +2127,36 @@ EXPORT_SYMBOL_GPL(skb_zerocopy_headlen);
  *
  * The `hlen` as calculated by skb_zerocopy_headlen() specifies the
  * headroom in the `to` buffer.
+ *
+ * Return value:
+ * 0: everything is OK
+ * -ENOMEM: couldn't orphan frags of from due to lack of memory
+ * -EFAULT: skb_copy_bits() found some problem with geometry
  */
-void
+int
 skb_zerocopy(struct sk_buff *to, const struct sk_buff *from, int len, int hlen)
 {
int i, j = 0;
int plen = 0; /* length of skb->head fragment */
+   int ret;
struct page *page;
unsigned int offset;
 
BUG_ON(!from->head_frag && !hlen);
 
/* dont bother with small payloads */
-   if (len <= skb_tailroom(to)) {
-   skb_copy_bits(from, 0, skb_put(to, len), len);
-   return;
+   if (len <= skb_tailroom(to))
+   return skb_copy_bits(from, 0, skb_put(to, len), len);
+
+   if (unlikely(skb_orphan_frags(to, GFP_ATOMIC))) {
+   skb_tx_error(to);
+   return -ENOMEM;
}
 
if (hlen) {
-   skb_copy_bits(from, 0, skb_put(to, hlen), hlen);
+   ret = skb_copy_bits(from, 0, skb_put(to, hlen), hlen);
+   if (unlikely(ret))
+   return ret;
len -= hlen;
} else {
plen = min_t(int, skb_headlen(from), len);
@@ -2173,6 +2184,8 @@ skb_zerocopy(struct sk_buff *to, const struct sk_buff 
*from, int len, int hlen)
j++;
}
skb_shinfo(to)->nr_frags = j;
+
+   return 0;
 }
 EXPORT_SYMBOL_GPL(skb_zerocopy);
 
diff --git a/net/netfilter/nfnetlink_queue_core.c 
b/net/netfilter/nfnetlink_queue_core.c
index f072fe8..ea02d16 100644
--- a/net/netfilter/nfnetlink_queue_core.c
+++ b/net/netfilter/nfnetlink_queue_core.c
@@ -488,7 +488,8 @@ nfqnl_build_packet_message(struct net *net, struct 
nfqnl_instance *queue,
nla->nla_type = NFQA_PAYLOAD;
nla->nla_len = nla_attr_size(data_len);
 
-   skb_zerocopy(skb, entskb, data_len, hlen);
+   if (skb_zerocopy(skb, entskb, data_len, hlen))
+   goto nla_put_failure;
}
 
nlh->nlmsg_len = skb->len;
diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index c53fe0c..9a9f668 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -464,7 +464,9 @@ static int queue_userspace_packet(struct datapath *dp, 
struct sk_buff *skb,
}
nla->nla_len = nla_attr_size(skb->len);
 
-   skb_zerocopy(user_skb, skb, skb->len, hlen);
+   err = skb_zerocopy(user_skb, skb, skb->len, hlen);
+   if (err)
+   goto out;
 
/* Pad OVS_PACKET_ATTR_PACKET if linear copy was performed */
if (!(dp->user_features & OVS_DP_F_UNALIGNED)) {
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] openvswitch: Orphan frags before sending to userspace via Netlink to avoid guest stall

2014-03-07 Thread Zoltan Kiss

On 07/03/14 04:46, Pravin Shelar wrote:

On Thu, Mar 6, 2014 at 9:09 AM, Zoltan Kiss  wrote:

Do you have any feedback on this? I'm also adding KVM list as they might be
interested in this.

Zoli


On 28/02/14 19:16, Zoltan Kiss wrote:


The kernel datapath now switched to zerocopy Netlink messages, but that
also
means that the pages on frags array are sent straight to userspace. If
those
pages came outside the kernel, we have to swap them out with local copies.

Signed-off-by: Zoltan Kiss 


I do not think this is required, netlink zero copy only maps
pre-allocated buffers to user-space.

How do you mean "pre-allocated"? By who?

As far as I've seen the skb in this function came straight from the 
device (vif in our case), and skb_zerocopy just copy the frags to 
user_skb, which is sent to the userspace. Those frags contain pages from 
guest, and it's a bad idea to pass them to userspace: e.g if userspace 
dies in the meantime, what happens with them? Also, in Xen's case they 
are actually not mapped to userspace, so accessing them can lead to garbage.


Zoli
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] openvswitch: Orphan frags before sending to userspace via Netlink to avoid guest stall

2014-03-06 Thread Zoltan Kiss
Do you have any feedback on this? I'm also adding KVM list as they might 
be interested in this.


Zoli

On 28/02/14 19:16, Zoltan Kiss wrote:

The kernel datapath now switched to zerocopy Netlink messages, but that also
means that the pages on frags array are sent straight to userspace. If those
pages came outside the kernel, we have to swap them out with local copies.

Signed-off-by: Zoltan Kiss 
---
  net/openvswitch/datapath.c |6 ++
  1 file changed, 6 insertions(+)

diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index 36f8872..ffb563c 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -464,6 +464,12 @@ static int queue_userspace_packet(struct datapath *dp, 
struct sk_buff *skb,
}
nla->nla_len = nla_attr_size(skb->len);

+   if (unlikely(skb_orphan_frags(skb, GFP_ATOMIC))) {
+   err = -ENOMEM;
+   skb_tx_error(skb);
+   goto out;
+   }
+
skb_zerocopy(user_skb, skb, skb->len, hlen);

/* Pad OVS_PACKET_ATTR_PACKET if linear copy was performed */



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] openvswitch: orphan frags on local receive

2014-02-24 Thread Zoltan Kiss

Hi,

I'm also planning a similar patch, but it will call skb_orphan_frags on 
the skb in datapath.c::queue_userspace_packet, right before 
skb_zerocopy, so packets sent up to userspace via Netlink doesn't harm 
guests. I haven't checked your patch thoroughly, does it handle a 
different scenario?


Regards,

Zoltan Kiss

On 24/02/14 13:15, Qin Chuanyu wrote:

with vhost tx zero_copy, guest nic might get hang when host reserving
skb in socket queue delivered by guest, the case has been solved in
tun, it also been needed by openvswitch. This could easily happened
when a LAST_ACK state tcp occuring between guest and host.

Signed-off-by: Chuanyu Qin 
---
 net/openvswitch/vport-internal_dev.c |3 +++
 net/openvswitch/vport.c  |1 +
 2 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/net/openvswitch/vport-internal_dev.c 
b/net/openvswitch/vport-internal_dev.c

index 729c687..adb25e2 100644
--- a/net/openvswitch/vport-internal_dev.c
+++ b/net/openvswitch/vport-internal_dev.c
@@ -212,6 +212,9 @@ static int internal_dev_recv(struct vport *vport, 
struct sk_buff *skb)

 struct net_device *netdev = netdev_vport_priv(vport)->dev;
 int len;

+if (unlikely(skb_orphan_frags(skb, GFP_ATOMIC)))
+return -NET_RX_DROP;
+
 len = skb->len;

 skb_dst_drop(skb);
diff --git a/net/openvswitch/vport.c b/net/openvswitch/vport.c
index 208dd9a..04172d6 100644
--- a/net/openvswitch/vport.c
+++ b/net/openvswitch/vport.c
@@ -383,6 +383,7 @@ int ovs_vport_send(struct vport *vport, struct 
sk_buff *skb)

 u64_stats_update_end(&stats->syncp);
 } else if (sent < 0) {
 ovs_vport_record_error(vport, VPORT_E_TX_ERROR);
+skb_tx_error(skb);
 kfree_skb(skb);
 } else
 ovs_vport_record_error(vport, VPORT_E_TX_DROPPED);


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] [RFC v2 1/4] bridge: enable interfaces to opt out from becoming the root bridge

2014-02-21 Thread Zoltan Kiss

On 20/02/14 20:01, Luis R. Rodriguez wrote:

On Thu, Feb 20, 2014 at 5:19 AM, Zoltan Kiss  wrote:

How about this: netback sets the root_block flag and a random MAC by
default. So the default behaviour won't change, DAD will be happy, and
userspace don't have to do anything unless it's using netback for STP root
bridge (I don't think there are too many toolstacks doing that), in which
case it has to remove the root_block flag instead of setting a random MAC.


:D that's exactly what I ended up proposing too. I mentioned how
xen-netback could do this as well, we'd keep or rename the flag I
added, and then the bridge could would look at it and enable the root
block if the flag is set. Stephen however does not like having the
bridge code look at magic flags for this behavior and would prefer for
us to get the tools to ask for the root block. Let's follow more up on
that thread
We don't need that new flag, just forget about it. Set that root_block 
flag from netback device init, around the time you generate the random 
MAC, or at the earliest possible time. Nothing else has to be done from 
kernel side. If someone wants netback to be a root port, then remove 
root_block from their tools, instead of changing the the MAC address, as 
it happens now.
Another problem with the random addresses, pointed out by Ian earlier, 
that when adding/removing interfaces, the bridge does recalculate it's 
MAC address, and choose the lowest one. In the general usecase I think 
that's normal, but in case of Xen networking, we would like to keep the 
bridge using the physical interface's MAC, because the local port of the 
bridge is used for Dom0 network traffic, therefore changing the bridge 
MAC when a netback device has lower MAC breaks that traffic. I think the 
best is to address this from userspace: if it set the MAC of the bridge 
explicitly, dev_set_mac_address() does dev->addr_assign_type = 
NET_ADDR_SET;, so br_stp_recalculate_bridge_id() will exit before 
changing anything.
And when I say userspace, I mean Xen specific tools which does 
networking configuration, e.g. xapi in XenServer case. Not brctl, it 
doesn't have to know whether this is a xenbrX device or a bridge used 
for another purposes.


Zoli

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] [RFC v2 1/4] bridge: enable interfaces to opt out from becoming the root bridge

2014-02-21 Thread Zoltan Kiss

On 20/02/14 20:24, Luis R. Rodriguez wrote:

On Thu, Feb 20, 2014 at 9:19 AM, Stephen Hemminger
 wrote:

On Wed, 19 Feb 2014 09:59:33 -0800 "Luis R. Rodriguez" 
 wrote:

On Wed, Feb 19, 2014 at 9:08 AM, Stephen Hemminger  
wrote:


Please only use the netlink/sysfs flags fields that already exist
for new features.


Sure, but what if we know a driver in most cases wants the root block
and we'd want to make it the default, thereby only requiring userspace
for toggling it off.


Something in userspace has to put the device into the bridge.
Fix the port setup in that tool via the netlink or sysfs flags in
the bridge. It should not have to be handled in the bridge looking
at magic flags in the device.


Agreed that's the best strategy and I'll work on sending patches to
brctl to enable the root_block preference. This approach however also
I don't think brctl should deal with any Xen specific stuff. I assume 
there is a misunderstanding in this thread: when I (and possibly other 
Xen folks) talk about "userspace" or "toolstack" here, I mean Xen 
specific tools which use e.g. brctl to set up bridges. Not brctl itself.

requires a userspace upgrade. I'm trying to see if we can get an
old-nasty-cryptic-hack practice removed from the kernel and we'd try
to prevent future drivers from using it -- without requiring userspace
upgrade. In this case the bad practice is to using a high static MAC
address for mimicking a root block default preference. In order to
remove that *without* requiring a userspace upgrade the dev->priv_flag
approach is the only thing I can think of. If this would go in we'd
replace the high static MAC address with a random MAC address to
prevent IPv6 SLAAC / DAD conflicts. I'd document this flag and
indicate with preference for userspace to be the one tuning these
knobs.

Without this we'd have to keep the high static MAC address on upstream
drivers and let userspace do the random'ization if it confirms the
userspace knob to turn the root block flag is available. Is the
priv_flag approach worth the compromise to remove the root block hack
practice?

   Luis



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 2/4] net: enables interface option to skip IP

2014-02-21 Thread Zoltan Kiss

On 20/02/14 20:39, Luis R. Rodriguez wrote:

On Wed, Feb 19, 2014 at 11:13 AM, Zoltan Kiss  wrote:

On 19/02/14 17:20, Luis R. Rodriguez wrote:

On 19/02/14 17:20, Luis R. Rodriguez also wrote:
Zoltan has noted though some use cases of IPv4 or IPv6 addresses on
backends though <...>


As discussed in the other threads though there *is* some use cases
of assigning IPv4 or IPv6 addresses to the backend interfaces though:
routing them (although its unclear to me if iptables can be used
instead, Zoltan?).


Not with OVS, it steals the packet before netfilter hooks.


Got it, thanks! Can't the route be added using a front-end IP address
instead on the host though ? I just tried that on a Xen system and it
seems to work. Perhaps I'm not understand the exact topology on the
routing case. So in my case I have the backend without any IPv4 or
IPv6 interfaces, the guest has IPv4, IPv6 addresses and even a TUN for
VPN and I can create routes on the host to the front end by not using
the backend device name but instead using the front-end target IP.

Check this how current Xen scripts does routed networking:

http://wiki.xen.org/wiki/Xen_Networking#Associating_routes_with_virtual_devices

Note, there are no bridges involved here! As the above page says, the 
backend has to have IP address, maybe it's not true anymore. I'm not too 
familiar with this setup too, I've used it only once.


Zoli

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] [RFC v2 1/4] bridge: enable interfaces to opt out from becoming the root bridge

2014-02-20 Thread Zoltan Kiss

On 19/02/14 16:45, Luis R. Rodriguez wrote:

On Mon, Feb 17, 2014 at 9:52 AM, Zoltan Kiss  wrote:

On 15/02/14 02:59, Luis R. Rodriguez wrote:


From: "Luis R. Rodriguez" 

It doesn't make sense for some interfaces to become a root bridge
at any point in time. One example is virtual backend interfaces
which rely on other entities on the bridge for actual physical
connectivity. They only provide virtual access.


It is possible that a guest bridge together to VIF, either from the same
Dom0 bridge or from different ones. In that case using STP on VIFs sound
sensible to me.


You seem to describe a case whereby it can make sense for xen-netback
interfaces to end up becoming the root port of a bridge. Can you
elaborate a little more on that as it was unclear the use case.
Well, I might be wrong on that, but the scenario I was thinking: a guest 
(let's say domain 1) can have multiple interfaces on different Dom0 (or 
driver domain) bridges, let's say vif1.0 is plugged into xenbr0 and 
vif1.1 is in xenbr1. If the guest wants to make a bridge of this two, 
then using STP makes sense. I wanted to bring up CloudStack's virtual 
router as an example, but then I realized it's probably doesn't do such 
thing. However I don't think we should hardcode that a netback interface 
can't be RP ever.




Additionally if such cases exist then under the current upstream
implementation one would simply need to change the MAC address in
order to enable a vif to become the root port.  Stephen noted there is
a way to avoid nominating an interface for a root port through the
root block flag. We should use that instead of the MAC address hacks.
Let's keep in mind that part of the motivation for this series is to
avoid a duplicate IPv6 address left in place by use cases whereby the
MAC address of the backend vif was left static. The use case your are
explaining likely describes the more prevalent use case where address
conflicts can occur, perhaps when administrators for got to change the
backend MAC address. If we embrace a random MAC address we'd avoid
that issue, and but we'd need to update userspace to use the root
block on topologies where desired.
If I understand you correctly, this is the same I suggested in my 
another email sent 1.5 hour ago.


Zoli

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] [RFC v2 1/4] bridge: enable interfaces to opt out from becoming the root bridge

2014-02-20 Thread Zoltan Kiss

On 19/02/14 17:02, Luis R. Rodriguez wrote:

On Wed, Feb 19, 2014 at 6:35 AM, Zoltan Kiss  wrote:

On 19/02/14 09:52, Ian Campbell wrote:

Can't we arrange things in the Xen hotplug scripts such that if the
root_block stuff isn't available/doesn't work we fallback to the
existing fe:ff:ff:ff:ff usage?

That would avoid concerns about forward/backwards compat I think. It
wouldn't solve the issue you are targeting on old systems, but it also
doesn't regress them any further.


I agree, I think this problem could be better handled from userspace: if it
can set root_block then change the default MAC to a random one, if it can't,
then stay with the default one. Or if someone doesn't care about STP but DAD
is still important, userspace can have a force_random_mac option somewhere
to change to a random MAC regardless of root_block presence.


Folks, what if I repurpose my patch to use the IFF_BRIDGE_NON_ROOT (or
relabel to IFF_ROOT_BLOCK_DEF) flag for a default driver preference
upon initialization so that root block will be used once the device
gets added to a bridge. The purpose would be to avoid drivers from
using the high MAC address hack, streamline to use a random MAC
address thereby avoiding the possible duplicate address situation for
IPv6. In the STP use case for these interfaces we'd just require
userspace to unset the root block. I'd consider the STP use case the
most odd of all. The caveat to this approach is 3.8 would be needed
(or its the root block patches cherry picked) for base kernels older
than 3.8.


How about this: netback sets the root_block flag and a random MAC by 
default. So the default behaviour won't change, DAD will be happy, and 
userspace don't have to do anything unless it's using netback for STP 
root bridge (I don't think there are too many toolstacks doing that), in 
which case it has to remove the root_block flag instead of setting a 
random MAC.


Zoli
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 2/4] net: enables interface option to skip IP

2014-02-19 Thread Zoltan Kiss

On 19/02/14 17:20, Luis R. Rodriguez wrote:

On Wed, Feb 19, 2014 at 8:45 AM, Dan Williams  wrote:

On Tue, 2014-02-18 at 13:19 -0800, Luis R. Rodriguez wrote:

On Mon, Feb 17, 2014 at 12:23 PM, Dan Williams  wrote:

On Fri, 2014-02-14 at 18:59 -0800, Luis R. Rodriguez wrote:

From: "Luis R. Rodriguez" 

Some interfaces do not need to have any IPv4 or IPv6
addresses, so enable an option to specify this. One
example where this is observed are virtualization
backend interfaces which just use the net_device
constructs to help with their respective frontends.

This should optimize boot time and complexity on
virtualization environments for each backend interface
while also avoiding triggering SLAAC and DAD, which is
simply pointless for these type of interfaces.


Would it not be better/cleaner to use disable_ipv6 and then add a
disable_ipv4 sysctl, then use those with that interface?


Sure, but note that the both disable_ipv6 and accept_dada sysctl
parameters are global. ipv4 and ipv6 interfaces are created upon
NETDEVICE_REGISTER, which will get triggered when a driver calls
register_netdev(). The goal of this patch was to enable an early
optimization for drivers that have no need ever for ipv4 or ipv6
interfaces.


Each interface gets override sysctls too though, eg:

/proc/sys/net/ipv6/conf/enp0s25/disable_ipv6


I hadn't seen those, thanks!


which is the one I meant; you're obviously right that the global ones
aren't what you want here.  But the specific ones should be suitable?


Under the approach Stephen mentioned by first ensuring the interface
is down yes. There's one use case I can consider to still want the
patch though, more on that below.


If you set that on a per-interface basis, then you'll get EPERM or
something whenever you try to add IPv6 addresses or do IPv6 routing.


Neat, thanks.


Zoltan has noted though some use cases of IPv4 or IPv6 addresses on
backends though, as such this is no longer applicable as a
requirement. The ipv4 sysctl however still seems like a reasonable
approach to enable optimizations of the network in topologies where
its known we won't need them but -- we'd need to consider a much more
granular solution, not just global as it is now for disable_ipv6, and
we'd also have to figure out a clean way to do this to not incur the
cost of early address interface addition upon register_netdev().

Given that we have a use case for ipv4 and ipv6 addresses on
xen-netback we no longer have an immediate use case for such early
optimization primitives though, so I'll drop this.


The IFF_SKIP_IP seems to duplicate at least part of what disable_ipv6 is
already doing.


disable_ipv6 is global, the goal was to make this granular and skip
the cost upon early boot, but its been clarified we don't need this.


Like Stephen says, you need to make sure you set them before IFF_UP, but
beyond that, wouldn't the interface-specific sysctls work?


Yeah that'll do it, unless there is a measurable run time benefit cost
to never even add these in the first place. Consider a host with tons
of guests, not sure how many is 'a lot' these days. One would have to
measure the cost of reducing the amount of time it takes to boot these
up. As discussed in the other threads though there *is* some use cases
of assigning IPv4 or IPv6 addresses to the backend interfaces though:
routing them (although its unclear to me if iptables can be used
instead, Zoltan?).


Not with OVS, it steals the packet before netfilter hooks.

Zoli
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] [RFC v2 1/4] bridge: enable interfaces to opt out from becoming the root bridge

2014-02-19 Thread Zoltan Kiss

On 19/02/14 09:52, Ian Campbell wrote:

On Tue, 2014-02-18 at 13:02 -0800, Luis R. Rodriguez wrote:

On Sun, Feb 16, 2014 at 10:57 AM, Stephen Hemminger
 wrote:

On Fri, 14 Feb 2014 18:59:37 -0800
"Luis R. Rodriguez"  wrote:


From: "Luis R. Rodriguez" 

It doesn't make sense for some interfaces to become a root bridge
at any point in time. One example is virtual backend interfaces
which rely on other entities on the bridge for actual physical
connectivity. They only provide virtual access.

Device drivers that know they should never become part of the
root bridge have been using a trick of setting their MAC address
to a high broadcast MAC address such as FE:FF:FF:FF:FF:FF. Instead
of using these hacks lets the interfaces annotate its intent and
generalizes a solution for multiple drivers, while letting the
drivers use a random MAC address or one prefixed with a proper OUI.
This sort of hack is used by both qemu and xen for their backend
interfaces.

Cc: Stephen Hemminger 
Cc: bri...@lists.linux-foundation.org
Cc: net...@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Signed-off-by: Luis R. Rodriguez 


This is already supported in a more standard way via the root
block flag.


Great! For documentation purposes the root_block flag is a sysfs
attribute, added via 3.8 through commit 1007dd1a. The respective
interface flag is IFLA_BRPORT_PROTECT and can be set via the iproute2
bridge utility or through sysfs:

mcgrof@garbanzo ~/linux (git::master)$ find /sys/ -name root_block
/sys/devices/pci:00/:00:04.0/:02:00.0/net/eth0/brport/root_block
/sys/devices/vif-3-0/net/vif3.0/brport/root_block
/sys/devices/virtual/net/vif3.0-emu/brport/root_block

mcgrof@garbanzo ~/devel/iproute2 (git::master)$ cat
/sys/devices/vif-3-0/net/vif3.0/brport/root_block
0
mcgrof@garbanzo ~/devel/iproute2 (git::master)$ sudo bridge link set
dev vif3.0 root_block on
mcgrof@garbanzo ~/devel/iproute2 (git::master)$ cat
/sys/devices/vif-3-0/net/vif3.0/brport/root_block
1

So if we'd want to avoid using the MAC address hack alternative to
skip a root port userspace would need to be updated to simply set this
attribute after adding the device to the bridge. Based on Zoltan's
feedback there seems to be use cases to not enable this always for all
xen-netback interfaces though as such we can just punt this to
userspace for the topologies that require this.

The original motivation for this series was to avoid the IPv6
duplicate address incurred by the MAC address hack for avoiding the
root bridge. Given that Zoltan also noted a use case whereby IPv4 and
IPv6 addresses can be assigned to the backend interfaces we should be
able to avoid the duplicate address situation for IPv6 by using a
proper random MAC address *once* userspace has been updated also to
use IFLA_BRPORT_PROTECT. New userspace can't and won't need to set
this flag for older kernels (older than 3.8) as root_block is not
implemented on those kernels and the MAC address hack would still be
used there. This strategy however does put a requirement on new
kernels to use new userspace as otherwise the MAC address workaround
would not be in place and root_block would not take effect.


Can't we arrange things in the Xen hotplug scripts such that if the
root_block stuff isn't available/doesn't work we fallback to the
existing fe:ff:ff:ff:ff usage?

That would avoid concerns about forward/backwards compat I think. It
wouldn't solve the issue you are targeting on old systems, but it also
doesn't regress them any further.


I agree, I think this problem could be better handled from userspace: if 
it can set root_block then change the default MAC to a random one, if it 
can't, then stay with the default one. Or if someone doesn't care about 
STP but DAD is still important, userspace can have a force_random_mac 
option somewhere to change to a random MAC regardless of root_block 
presence.


Zoli
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] [RFC v2 1/4] bridge: enable interfaces to opt out from becoming the root bridge

2014-02-17 Thread Zoltan Kiss

On 15/02/14 02:59, Luis R. Rodriguez wrote:

From: "Luis R. Rodriguez" 

It doesn't make sense for some interfaces to become a root bridge
at any point in time. One example is virtual backend interfaces
which rely on other entities on the bridge for actual physical
connectivity. They only provide virtual access.


It is possible that a guest bridge together to VIF, either from the same 
Dom0 bridge or from different ones. In that case using STP on VIFs sound 
sensible to me.


Zoli
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] [RFC v2 4/4] xen-netback: skip IPv4 and IPv6 interfaces

2014-02-17 Thread Zoltan Kiss

There is a valid scenario to put IP addresses on the backend VIFs:

http://wiki.xen.org/wiki/Xen_Networking#Routing

Also, the backend is not necessarily Dom0, you can connect twou guests 
with backend/frontend pairs.


Zoli

On 15/02/14 02:59, Luis R. Rodriguez wrote:

From: "Luis R. Rodriguez" 

The xen-netback driver is used only to provide a backend
interface for the frontend. The link is the only thing we
use, and that is used internally for letting us know when the
xen-netfront is ready, when it switches to XenbusStateConnected.

Note that only when the both the xen-netfront and xen-netback
are both in state XenbusStateConnected will xen-netback allow
userspace on the host (backend) to bring up the interface. Enabling
and disabling the interface will simply enable or disable NAPI
respectively, and that's used for IRQ communication set up with
the xen event channels.

Cc: Paul Durrant 
Cc: Ian Campbell 
Cc: Wei Liu 
Cc: xen-de...@lists.xenproject.org
Cc: net...@vger.kernel.org
Signed-off-by: Luis R. Rodriguez 
---
  drivers/net/xen-netback/interface.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/xen-netback/interface.c 
b/drivers/net/xen-netback/interface.c
index d380e3f..07e6fd2 100644
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -351,7 +351,7 @@ struct xenvif *xenvif_alloc(struct device *parent, domid_t 
domid,

eth_hw_addr_random(dev);
memcpy(dev->dev_addr, xen_oui, 3);
-   dev->priv_flags |= IFF_BRIDGE_NON_ROOT;
+   dev->priv_flags |= IFF_BRIDGE_NON_ROOT | IFF_SKIP_IP;
netif_napi_add(dev, &vif->napi, xenvif_poll, XENVIF_NAPI_WEIGHT);

netif_carrier_off(dev);



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html