Re: [ovs-dev] [PATCH] stt: linearize for CONFIG_SLUB case

2016-04-05 Thread Jesse Gross
On Mon, Apr 4, 2016 at 7:57 PM, pravin shelar  wrote:
> On Mon, Apr 4, 2016 at 1:56 PM, Jesse Gross  wrote:
>> On Fri, Apr 1, 2016 at 4:58 PM, pravin shelar  wrote:
>>> On Thu, Mar 31, 2016 at 9:06 PM, Jesse Gross  wrote:
 On Thu, Mar 31, 2016 at 2:30 PM, Pravin B Shelar  wrote:
> diff --git a/datapath/linux/compat/stt.c b/datapath/linux/compat/stt.c
> index eb397e8..ae33d5e 100644
> --- a/datapath/linux/compat/stt.c
> +++ b/datapath/linux/compat/stt.c
>  static int coalesce_skb(struct sk_buff **headp)
>  {
> -   struct sk_buff *frag, *head, *prev;
> +   struct sk_buff *frag, *head;
> +#ifndef SKIP_ZERO_COPY
> +   struct sk_buff *prev;
> +#endif
> int err;
>
> err = straighten_frag_list(headp);

 I don't think that we need to straighten the frag list in the
 SKIP_ZERO_COPY case. It's basically just undoing what the for loop
 that forms the frag list will do. __skb_linearize() will be able to
 handle it in any case.

>>> I can skip straightening and work on skb with frag-list, but I do not
>>> want to complicate the code.  This case is pretty rare after the
>>> change in reassemble where complete skb is allocated on first set
>>> segment.
>>
>> I don't think that it should increase complexity - I believe that we
>> can just skip the call to straighten_frag_list() and everything else
>> will continue to work.
>>
> One difference is that the fragments iterator reads the list from
> skb->next, which is what straighten_frag_list() generate. If we skip
> the call the list is in shinfo-frag-list.
> Let me see if this can be simplified a bit.

I think it's actually still OK as is. When there is a frag list, the
expectation is that the length, data_len, and truesize for all of the
individual fragments are summed in the main skb at the head of the
list. normalize_frag_list() actually needs to undo this. As a result,
the fact that the frag_list iterator won't descend into the inner
frag_lists shouldn't be a problem. I think when we get to the
assignment of skb_shinfo(head)->frag_list, we can just stick
head->next onto the end of the list if there is any existing
frag_list.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH] stt: linearize for CONFIG_SLUB case

2016-04-04 Thread pravin shelar
On Mon, Apr 4, 2016 at 1:56 PM, Jesse Gross  wrote:
> On Fri, Apr 1, 2016 at 4:58 PM, pravin shelar  wrote:
>> On Thu, Mar 31, 2016 at 9:06 PM, Jesse Gross  wrote:
>>> On Thu, Mar 31, 2016 at 2:30 PM, Pravin B Shelar  wrote:
 diff --git a/datapath/linux/compat/stt.c b/datapath/linux/compat/stt.c
 index eb397e8..ae33d5e 100644
 --- a/datapath/linux/compat/stt.c
 +++ b/datapath/linux/compat/stt.c
  static int coalesce_skb(struct sk_buff **headp)
  {
 -   struct sk_buff *frag, *head, *prev;
 +   struct sk_buff *frag, *head;
 +#ifndef SKIP_ZERO_COPY
 +   struct sk_buff *prev;
 +#endif
 int err;

 err = straighten_frag_list(headp);
>>>
>>> I don't think that we need to straighten the frag list in the
>>> SKIP_ZERO_COPY case. It's basically just undoing what the for loop
>>> that forms the frag list will do. __skb_linearize() will be able to
>>> handle it in any case.
>>>
>> I can skip straightening and work on skb with frag-list, but I do not
>> want to complicate the code.  This case is pretty rare after the
>> change in reassemble where complete skb is allocated on first set
>> segment.
>
> I don't think that it should increase complexity - I believe that we
> can just skip the call to straighten_frag_list() and everything else
> will continue to work.
>
One difference is that the fragments iterator reads the list from
skb->next, which is what straighten_frag_list() generate. If we skip
the call the list is in shinfo-frag-list.
Let me see if this can be simplified a bit.

 +#ifdef SKIP_ZERO_COPY
 +   if (skb_shinfo(head)->frag_list) {
 +   err = __skb_linearize(head);
 +   return err;
 +   }
 +#endif
>>>
>>> Maybe move this to try_to_segment()? It seems like it is a little more
>>> consistent.
>>>
>> But it is not same. I think we only need to linearize of there is
>> frag-list, AFAIK non linear data in skb_shinfo can be safely handled
>> in skb-segment.
>
> OK, that's true.
>
>>> This is effectively optimizing for the case where all packets are
>>> large and will generate a frag list after reassembly. However, it's
>>> possible in many situations that packets can be reassembled with zero
>>> copy that doesn't later need to be split (maybe the total packet is
>>> around 20k). Do you know how the performance compares vs. just
>>> linearizing at the end in that case? Is there a benefit to doing an
>>> early copy?
>>>
>> We are not trying to coalesce skb in case of SKIP_ZERO_COPY, so any
>> partial skb segment will generate multiple inner segments. I have seen
>> better performance with this approach rather than linearizing at the
>> end.
>
> Do you know how the performance compares for medium sized packets
> where we would be able to coalesce without linearizing? It would be an
> interesting intermediate datapoint between very small and very large
> packets.

ok, I will check it.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH] stt: linearize for CONFIG_SLUB case

2016-04-04 Thread Jesse Gross
On Fri, Apr 1, 2016 at 4:58 PM, pravin shelar  wrote:
> On Thu, Mar 31, 2016 at 9:06 PM, Jesse Gross  wrote:
>> On Thu, Mar 31, 2016 at 2:30 PM, Pravin B Shelar  wrote:
>>> diff --git a/datapath/linux/compat/stt.c b/datapath/linux/compat/stt.c
>>> index eb397e8..ae33d5e 100644
>>> --- a/datapath/linux/compat/stt.c
>>> +++ b/datapath/linux/compat/stt.c
>>>  static int coalesce_skb(struct sk_buff **headp)
>>>  {
>>> -   struct sk_buff *frag, *head, *prev;
>>> +   struct sk_buff *frag, *head;
>>> +#ifndef SKIP_ZERO_COPY
>>> +   struct sk_buff *prev;
>>> +#endif
>>> int err;
>>>
>>> err = straighten_frag_list(headp);
>>
>> I don't think that we need to straighten the frag list in the
>> SKIP_ZERO_COPY case. It's basically just undoing what the for loop
>> that forms the frag list will do. __skb_linearize() will be able to
>> handle it in any case.
>>
> I can skip straightening and work on skb with frag-list, but I do not
> want to complicate the code.  This case is pretty rare after the
> change in reassemble where complete skb is allocated on first set
> segment.

I don't think that it should increase complexity - I believe that we
can just skip the call to straighten_frag_list() and everything else
will continue to work.

>>> +#ifdef SKIP_ZERO_COPY
>>> +   if (skb_shinfo(head)->frag_list) {
>>> +   err = __skb_linearize(head);
>>> +   return err;
>>> +   }
>>> +#endif
>>
>> Maybe move this to try_to_segment()? It seems like it is a little more
>> consistent.
>>
> But it is not same. I think we only need to linearize of there is
> frag-list, AFAIK non linear data in skb_shinfo can be safely handled
> in skb-segment.

OK, that's true.

>> This is effectively optimizing for the case where all packets are
>> large and will generate a frag list after reassembly. However, it's
>> possible in many situations that packets can be reassembled with zero
>> copy that doesn't later need to be split (maybe the total packet is
>> around 20k). Do you know how the performance compares vs. just
>> linearizing at the end in that case? Is there a benefit to doing an
>> early copy?
>>
> We are not trying to coalesce skb in case of SKIP_ZERO_COPY, so any
> partial skb segment will generate multiple inner segments. I have seen
> better performance with this approach rather than linearizing at the
> end.

Do you know how the performance compares for medium sized packets
where we would be able to coalesce without linearizing? It would be an
interesting intermediate datapoint between very small and very large
packets.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH] stt: linearize for CONFIG_SLUB case

2016-04-01 Thread pravin shelar
On Thu, Mar 31, 2016 at 9:06 PM, Jesse Gross  wrote:
> On Thu, Mar 31, 2016 at 2:30 PM, Pravin B Shelar  wrote:
>> STT implementation we saw performance improvements with linearizing
>> skb for SLUB case.  So following patch skips zero copy operation
>> for such a case.
>>
>> Tested-By: Vasmi Abidi 
>> Signed-off-by: Pravin B Shelar 
>
> Can you add some performance numbers to the commit message?
>
ok.

>> diff --git a/datapath/linux/compat/stt.c b/datapath/linux/compat/stt.c
>> index eb397e8..ae33d5e 100644
>> --- a/datapath/linux/compat/stt.c
>> +++ b/datapath/linux/compat/stt.c
>>  static int coalesce_skb(struct sk_buff **headp)
>>  {
>> -   struct sk_buff *frag, *head, *prev;
>> +   struct sk_buff *frag, *head;
>> +#ifndef SKIP_ZERO_COPY
>> +   struct sk_buff *prev;
>> +#endif
>> int err;
>>
>> err = straighten_frag_list(headp);
>
> I don't think that we need to straighten the frag list in the
> SKIP_ZERO_COPY case. It's basically just undoing what the for loop
> that forms the frag list will do. __skb_linearize() will be able to
> handle it in any case.
>
I can skip straightening and work on skb with frag-list, but I do not
want to complicate the code.  This case is pretty rare after the
change in reassemble where complete skb is allocated on first set
segment.

>> +#ifdef SKIP_ZERO_COPY
>> +   if (skb_shinfo(head)->frag_list) {
>> +   err = __skb_linearize(head);
>> +   return err;
>> +   }
>> +#endif
>
> Maybe move this to try_to_segment()? It seems like it is a little more
> consistent.
>
But it is not same. I think we only need to linearize of there is
frag-list, AFAIK non linear data in skb_shinfo can be safely handled
in skb-segment.

>>  static int segment_skb(struct sk_buff **headp, bool csum_partial,
>>bool ipv4, bool tcp, int l4_offset)
>>  {
>> +#ifndef SKIP_ZERO_COPY
>> int err;
>>
>> err = coalesce_skb(headp);
>> @@ -543,6 +575,7 @@ static int segment_skb(struct sk_buff **headp, bool 
>> csum_partial,
>> if (skb_shinfo(*headp)->frag_list)
>> return __try_to_segment(*headp, csum_partial,
>> ipv4, tcp, l4_offset);
>> +#endif
>> return 0;
>>  }
>
> Is this OK? It will retain a skb with a frag_list on the transmit path
> where this wasn't possible before. This used to cause kernel panics
> since the skb geometry wasn't even when the packet went to be
> segmented. I don't remember if this is still the case but if not then
> it seems like we should be able to simply the code regardless of this
> patch.
>
right, I missed it. I will keep the segmentation here.

>> @@ -1093,6 +1141,15 @@ static struct sk_buff *reassemble(struct sk_buff *skb)
>>
>> frag = lookup_frag(dev_net(skb->dev), stt_percpu, , hash);
>> if (!frag->skbs) {
>> +   int err;
>> +
>> +   err = pskb_expand_head(skb, skb_headroom(skb),
>> +  skb->data_len + tot_len, GFP_ATOMIC);
>> +   if (likely(!err)) {
>> +   if (unlikely(!__pskb_pull_tail(skb, skb->data_len)))
>> +   BUG();
>> +   }
>
> This linearizes the packet even in non-SKIP_ZERO_COPY cases. I guess
> we probably don't want to do that. It's also possible that the first
> skb received isn't necessarily the first packet in the reassembled
> packet.
>
ok, I will fix it.

> This is effectively optimizing for the case where all packets are
> large and will generate a frag list after reassembly. However, it's
> possible in many situations that packets can be reassembled with zero
> copy that doesn't later need to be split (maybe the total packet is
> around 20k). Do you know how the performance compares vs. just
> linearizing at the end in that case? Is there a benefit to doing an
> early copy?
>
We are not trying to coalesce skb in case of SKIP_ZERO_COPY, so any
partial skb segment will generate multiple inner segments. I have seen
better performance with this approach rather than linearizing at the
end.

>> @@ -1154,8 +1216,10 @@ static struct sk_buff *reassemble(struct sk_buff *skb)
>>
>> FRAG_CB(frag->skbs)->first.set_ecn_ce |= INET_ECN_is_ce(iph->tos);
>> FRAG_CB(frag->skbs)->first.rcvd_len += skb->len;
>> -   FRAG_CB(frag->skbs)->first.mem_used += skb->truesize;
>> -   stt_percpu->frag_mem_used += skb->truesize;
>> +   if (!copied_skb) {
>> +   FRAG_CB(frag->skbs)->first.mem_used += skb->truesize;
>> +   stt_percpu->frag_mem_used += skb->truesize;
>> +   }
>
> pskb_expand_head() doesn't actually change the truesize so our count
> will be more inaccurate than before. However, we don't have to worry
> about the attack case of very small packets consuming a large amount
> of memory due to having many copies of struct sk_buff.

ok. I will 

Re: [ovs-dev] [PATCH] stt: linearize for CONFIG_SLUB case

2016-03-31 Thread Jesse Gross
On Thu, Mar 31, 2016 at 2:30 PM, Pravin B Shelar  wrote:
> STT implementation we saw performance improvements with linearizing
> skb for SLUB case.  So following patch skips zero copy operation
> for such a case.
>
> Tested-By: Vasmi Abidi 
> Signed-off-by: Pravin B Shelar 

Can you add some performance numbers to the commit message?

> diff --git a/datapath/linux/compat/stt.c b/datapath/linux/compat/stt.c
> index eb397e8..ae33d5e 100644
> --- a/datapath/linux/compat/stt.c
> +++ b/datapath/linux/compat/stt.c
>  static int coalesce_skb(struct sk_buff **headp)
>  {
> -   struct sk_buff *frag, *head, *prev;
> +   struct sk_buff *frag, *head;
> +#ifndef SKIP_ZERO_COPY
> +   struct sk_buff *prev;
> +#endif
> int err;
>
> err = straighten_frag_list(headp);

I don't think that we need to straighten the frag list in the
SKIP_ZERO_COPY case. It's basically just undoing what the for loop
that forms the frag list will do. __skb_linearize() will be able to
handle it in any case.

> +#ifdef SKIP_ZERO_COPY
> +   if (skb_shinfo(head)->frag_list) {
> +   err = __skb_linearize(head);
> +   return err;
> +   }
> +#endif

Maybe move this to try_to_segment()? It seems like it is a little more
consistent.

>  static int segment_skb(struct sk_buff **headp, bool csum_partial,
>bool ipv4, bool tcp, int l4_offset)
>  {
> +#ifndef SKIP_ZERO_COPY
> int err;
>
> err = coalesce_skb(headp);
> @@ -543,6 +575,7 @@ static int segment_skb(struct sk_buff **headp, bool 
> csum_partial,
> if (skb_shinfo(*headp)->frag_list)
> return __try_to_segment(*headp, csum_partial,
> ipv4, tcp, l4_offset);
> +#endif
> return 0;
>  }

Is this OK? It will retain a skb with a frag_list on the transmit path
where this wasn't possible before. This used to cause kernel panics
since the skb geometry wasn't even when the packet went to be
segmented. I don't remember if this is still the case but if not then
it seems like we should be able to simply the code regardless of this
patch.

> @@ -1093,6 +1141,15 @@ static struct sk_buff *reassemble(struct sk_buff *skb)
>
> frag = lookup_frag(dev_net(skb->dev), stt_percpu, , hash);
> if (!frag->skbs) {
> +   int err;
> +
> +   err = pskb_expand_head(skb, skb_headroom(skb),
> +  skb->data_len + tot_len, GFP_ATOMIC);
> +   if (likely(!err)) {
> +   if (unlikely(!__pskb_pull_tail(skb, skb->data_len)))
> +   BUG();
> +   }

This linearizes the packet even in non-SKIP_ZERO_COPY cases. I guess
we probably don't want to do that. It's also possible that the first
skb received isn't necessarily the first packet in the reassembled
packet.

This is effectively optimizing for the case where all packets are
large and will generate a frag list after reassembly. However, it's
possible in many situations that packets can be reassembled with zero
copy that doesn't later need to be split (maybe the total packet is
around 20k). Do you know how the performance compares vs. just
linearizing at the end in that case? Is there a benefit to doing an
early copy?

> @@ -1154,8 +1216,10 @@ static struct sk_buff *reassemble(struct sk_buff *skb)
>
> FRAG_CB(frag->skbs)->first.set_ecn_ce |= INET_ECN_is_ce(iph->tos);
> FRAG_CB(frag->skbs)->first.rcvd_len += skb->len;
> -   FRAG_CB(frag->skbs)->first.mem_used += skb->truesize;
> -   stt_percpu->frag_mem_used += skb->truesize;
> +   if (!copied_skb) {
> +   FRAG_CB(frag->skbs)->first.mem_used += skb->truesize;
> +   stt_percpu->frag_mem_used += skb->truesize;
> +   }

pskb_expand_head() doesn't actually change the truesize so our count
will be more inaccurate than before. However, we don't have to worry
about the attack case of very small packets consuming a large amount
of memory due to having many copies of struct sk_buff.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] [PATCH] stt: linearize for CONFIG_SLUB case

2016-03-31 Thread Pravin B Shelar
STT implementation we saw performance improvements with linearizing
skb for SLUB case.  So following patch skips zero copy operation
for such a case.

Tested-By: Vasmi Abidi 
Signed-off-by: Pravin B Shelar 
---
 datapath/linux/compat/stt.c | 77 +
 1 file changed, 71 insertions(+), 6 deletions(-)

diff --git a/datapath/linux/compat/stt.c b/datapath/linux/compat/stt.c
index eb397e8..ae33d5e 100644
--- a/datapath/linux/compat/stt.c
+++ b/datapath/linux/compat/stt.c
@@ -49,6 +49,15 @@
 #define STT_DST_PORT 7471
 
 #ifdef OVS_STT
+#ifdef CONFIG_SLUB
+/*
+ * We saw better performance with skipping zero copy in case of SLUB.
+ * So skip zero copy for SLUB case.
+ */
+#define SKIP_ZERO_COPY
+#endif
+
+
 #define STT_VER 0
 
 /* @list: Per-net list of STT ports.
@@ -286,6 +295,7 @@ static int straighten_frag_list(struct sk_buff **skbp)
return 0;
 }
 
+#ifndef SKIP_ZERO_COPY
 static void copy_skb_metadata(struct sk_buff *to, struct sk_buff *from)
 {
to->protocol = from->protocol;
@@ -464,10 +474,14 @@ static int skb_list_segment(struct sk_buff *head, bool 
ipv4, int l4_offset)
update_headers(head, true, l4_offset, hdr_len, ipv4, 0);
return 0;
 }
+#endif
 
 static int coalesce_skb(struct sk_buff **headp)
 {
-   struct sk_buff *frag, *head, *prev;
+   struct sk_buff *frag, *head;
+#ifndef SKIP_ZERO_COPY
+   struct sk_buff *prev;
+#endif
int err;
 
err = straighten_frag_list(headp);
@@ -475,6 +489,7 @@ static int coalesce_skb(struct sk_buff **headp)
return err;
head = *headp;
 
+#ifndef SKIP_ZERO_COPY
/* Coalesce frag list. */
prev = head;
for (frag = head->next; frag; frag = frag->next) {
@@ -500,6 +515,7 @@ static int coalesce_skb(struct sk_buff **headp)
if (!head->next)
return 0;
 
+#endif
for (frag = head->next; frag; frag = frag->next) {
head->len += frag->len;
head->data_len += frag->len;
@@ -508,9 +524,16 @@ static int coalesce_skb(struct sk_buff **headp)
 
skb_shinfo(head)->frag_list = head->next;
head->next = NULL;
+#ifdef SKIP_ZERO_COPY
+   if (skb_shinfo(head)->frag_list) {
+   err = __skb_linearize(head);
+   return err;
+   }
+#endif
return 0;
 }
 
+#ifndef SKIP_ZERO_COPY
 static int __try_to_segment(struct sk_buff *skb, bool csum_partial,
bool ipv4, bool tcp, int l4_offset)
 {
@@ -519,9 +542,16 @@ static int __try_to_segment(struct sk_buff *skb, bool 
csum_partial,
else
return skb_linearize(skb);
 }
+#endif
 
 static int try_to_segment(struct sk_buff *skb)
 {
+#ifdef SKIP_ZERO_COPY
+   /* coalesce_skb() since does not generate frag-list no need to
+* linearize it here.
+*/
+   return 0;
+#else
struct stthdr *stth = stt_hdr(skb);
bool csum_partial = !!(stth->flags & STT_CSUM_PARTIAL);
bool ipv4 = !!(stth->flags & STT_PROTO_IPV4);
@@ -529,11 +559,13 @@ static int try_to_segment(struct sk_buff *skb)
int l4_offset = stth->l4_offset;
 
return __try_to_segment(skb, csum_partial, ipv4, tcp, l4_offset);
+#endif
 }
 
 static int segment_skb(struct sk_buff **headp, bool csum_partial,
   bool ipv4, bool tcp, int l4_offset)
 {
+#ifndef SKIP_ZERO_COPY
int err;
 
err = coalesce_skb(headp);
@@ -543,6 +575,7 @@ static int segment_skb(struct sk_buff **headp, bool 
csum_partial,
if (skb_shinfo(*headp)->frag_list)
return __try_to_segment(*headp, csum_partial,
ipv4, tcp, l4_offset);
+#endif
return 0;
 }
 
@@ -1054,13 +1087,28 @@ static struct pkt_frag *lookup_frag(struct net *net,
return victim_frag;
 }
 
+#ifdef SKIP_ZERO_COPY
+static int __copy_skb(struct sk_buff *to, struct sk_buff *from)
+{
+   if (to->next)
+   return -EINVAL;
+
+   if (unlikely(to->data_len || (from->len > skb_tailroom(to
+   return -EINVAL;
+   skb_copy_bits(from, 0, skb_put(to, from->len), from->len);
+   return 0;
+}
+#else
+#define __copy_skb(a, b)   -EINVAL;
+#endif
+
 static struct sk_buff *reassemble(struct sk_buff *skb)
 {
struct iphdr *iph = ip_hdr(skb);
struct tcphdr *tcph = tcp_hdr(skb);
u32 seq = ntohl(tcph->seq);
struct stt_percpu *stt_percpu;
-   struct sk_buff *last_skb;
+   struct sk_buff *last_skb, *copied_skb = NULL;
struct pkt_frag *frag;
struct pkt_key key;
int tot_len;
@@ -1093,6 +1141,15 @@ static struct sk_buff *reassemble(struct sk_buff *skb)
 
frag = lookup_frag(dev_net(skb->dev), stt_percpu, , hash);
if (!frag->skbs) {
+   int err;
+
+   err = pskb_expand_head(skb, skb_headroom(skb),
+  skb->data_len + tot_len,