Re: [ovs-dev] [PATCH] stt: linearize for CONFIG_SLUB case
On Mon, Apr 4, 2016 at 7:57 PM, pravin shelarwrote: > On Mon, Apr 4, 2016 at 1:56 PM, Jesse Gross wrote: >> On Fri, Apr 1, 2016 at 4:58 PM, pravin shelar wrote: >>> On Thu, Mar 31, 2016 at 9:06 PM, Jesse Gross wrote: On Thu, Mar 31, 2016 at 2:30 PM, Pravin B Shelar wrote: > diff --git a/datapath/linux/compat/stt.c b/datapath/linux/compat/stt.c > index eb397e8..ae33d5e 100644 > --- a/datapath/linux/compat/stt.c > +++ b/datapath/linux/compat/stt.c > static int coalesce_skb(struct sk_buff **headp) > { > - struct sk_buff *frag, *head, *prev; > + struct sk_buff *frag, *head; > +#ifndef SKIP_ZERO_COPY > + struct sk_buff *prev; > +#endif > int err; > > err = straighten_frag_list(headp); I don't think that we need to straighten the frag list in the SKIP_ZERO_COPY case. It's basically just undoing what the for loop that forms the frag list will do. __skb_linearize() will be able to handle it in any case. >>> I can skip straightening and work on skb with frag-list, but I do not >>> want to complicate the code. This case is pretty rare after the >>> change in reassemble where complete skb is allocated on first set >>> segment. >> >> I don't think that it should increase complexity - I believe that we >> can just skip the call to straighten_frag_list() and everything else >> will continue to work. >> > One difference is that the fragments iterator reads the list from > skb->next, which is what straighten_frag_list() generate. If we skip > the call the list is in shinfo-frag-list. > Let me see if this can be simplified a bit. I think it's actually still OK as is. When there is a frag list, the expectation is that the length, data_len, and truesize for all of the individual fragments are summed in the main skb at the head of the list. normalize_frag_list() actually needs to undo this. As a result, the fact that the frag_list iterator won't descend into the inner frag_lists shouldn't be a problem. I think when we get to the assignment of skb_shinfo(head)->frag_list, we can just stick head->next onto the end of the list if there is any existing frag_list. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH] stt: linearize for CONFIG_SLUB case
On Mon, Apr 4, 2016 at 1:56 PM, Jesse Grosswrote: > On Fri, Apr 1, 2016 at 4:58 PM, pravin shelar wrote: >> On Thu, Mar 31, 2016 at 9:06 PM, Jesse Gross wrote: >>> On Thu, Mar 31, 2016 at 2:30 PM, Pravin B Shelar wrote: diff --git a/datapath/linux/compat/stt.c b/datapath/linux/compat/stt.c index eb397e8..ae33d5e 100644 --- a/datapath/linux/compat/stt.c +++ b/datapath/linux/compat/stt.c static int coalesce_skb(struct sk_buff **headp) { - struct sk_buff *frag, *head, *prev; + struct sk_buff *frag, *head; +#ifndef SKIP_ZERO_COPY + struct sk_buff *prev; +#endif int err; err = straighten_frag_list(headp); >>> >>> I don't think that we need to straighten the frag list in the >>> SKIP_ZERO_COPY case. It's basically just undoing what the for loop >>> that forms the frag list will do. __skb_linearize() will be able to >>> handle it in any case. >>> >> I can skip straightening and work on skb with frag-list, but I do not >> want to complicate the code. This case is pretty rare after the >> change in reassemble where complete skb is allocated on first set >> segment. > > I don't think that it should increase complexity - I believe that we > can just skip the call to straighten_frag_list() and everything else > will continue to work. > One difference is that the fragments iterator reads the list from skb->next, which is what straighten_frag_list() generate. If we skip the call the list is in shinfo-frag-list. Let me see if this can be simplified a bit. +#ifdef SKIP_ZERO_COPY + if (skb_shinfo(head)->frag_list) { + err = __skb_linearize(head); + return err; + } +#endif >>> >>> Maybe move this to try_to_segment()? It seems like it is a little more >>> consistent. >>> >> But it is not same. I think we only need to linearize of there is >> frag-list, AFAIK non linear data in skb_shinfo can be safely handled >> in skb-segment. > > OK, that's true. > >>> This is effectively optimizing for the case where all packets are >>> large and will generate a frag list after reassembly. However, it's >>> possible in many situations that packets can be reassembled with zero >>> copy that doesn't later need to be split (maybe the total packet is >>> around 20k). Do you know how the performance compares vs. just >>> linearizing at the end in that case? Is there a benefit to doing an >>> early copy? >>> >> We are not trying to coalesce skb in case of SKIP_ZERO_COPY, so any >> partial skb segment will generate multiple inner segments. I have seen >> better performance with this approach rather than linearizing at the >> end. > > Do you know how the performance compares for medium sized packets > where we would be able to coalesce without linearizing? It would be an > interesting intermediate datapoint between very small and very large > packets. ok, I will check it. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH] stt: linearize for CONFIG_SLUB case
On Fri, Apr 1, 2016 at 4:58 PM, pravin shelarwrote: > On Thu, Mar 31, 2016 at 9:06 PM, Jesse Gross wrote: >> On Thu, Mar 31, 2016 at 2:30 PM, Pravin B Shelar wrote: >>> diff --git a/datapath/linux/compat/stt.c b/datapath/linux/compat/stt.c >>> index eb397e8..ae33d5e 100644 >>> --- a/datapath/linux/compat/stt.c >>> +++ b/datapath/linux/compat/stt.c >>> static int coalesce_skb(struct sk_buff **headp) >>> { >>> - struct sk_buff *frag, *head, *prev; >>> + struct sk_buff *frag, *head; >>> +#ifndef SKIP_ZERO_COPY >>> + struct sk_buff *prev; >>> +#endif >>> int err; >>> >>> err = straighten_frag_list(headp); >> >> I don't think that we need to straighten the frag list in the >> SKIP_ZERO_COPY case. It's basically just undoing what the for loop >> that forms the frag list will do. __skb_linearize() will be able to >> handle it in any case. >> > I can skip straightening and work on skb with frag-list, but I do not > want to complicate the code. This case is pretty rare after the > change in reassemble where complete skb is allocated on first set > segment. I don't think that it should increase complexity - I believe that we can just skip the call to straighten_frag_list() and everything else will continue to work. >>> +#ifdef SKIP_ZERO_COPY >>> + if (skb_shinfo(head)->frag_list) { >>> + err = __skb_linearize(head); >>> + return err; >>> + } >>> +#endif >> >> Maybe move this to try_to_segment()? It seems like it is a little more >> consistent. >> > But it is not same. I think we only need to linearize of there is > frag-list, AFAIK non linear data in skb_shinfo can be safely handled > in skb-segment. OK, that's true. >> This is effectively optimizing for the case where all packets are >> large and will generate a frag list after reassembly. However, it's >> possible in many situations that packets can be reassembled with zero >> copy that doesn't later need to be split (maybe the total packet is >> around 20k). Do you know how the performance compares vs. just >> linearizing at the end in that case? Is there a benefit to doing an >> early copy? >> > We are not trying to coalesce skb in case of SKIP_ZERO_COPY, so any > partial skb segment will generate multiple inner segments. I have seen > better performance with this approach rather than linearizing at the > end. Do you know how the performance compares for medium sized packets where we would be able to coalesce without linearizing? It would be an interesting intermediate datapoint between very small and very large packets. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH] stt: linearize for CONFIG_SLUB case
On Thu, Mar 31, 2016 at 9:06 PM, Jesse Grosswrote: > On Thu, Mar 31, 2016 at 2:30 PM, Pravin B Shelar wrote: >> STT implementation we saw performance improvements with linearizing >> skb for SLUB case. So following patch skips zero copy operation >> for such a case. >> >> Tested-By: Vasmi Abidi >> Signed-off-by: Pravin B Shelar > > Can you add some performance numbers to the commit message? > ok. >> diff --git a/datapath/linux/compat/stt.c b/datapath/linux/compat/stt.c >> index eb397e8..ae33d5e 100644 >> --- a/datapath/linux/compat/stt.c >> +++ b/datapath/linux/compat/stt.c >> static int coalesce_skb(struct sk_buff **headp) >> { >> - struct sk_buff *frag, *head, *prev; >> + struct sk_buff *frag, *head; >> +#ifndef SKIP_ZERO_COPY >> + struct sk_buff *prev; >> +#endif >> int err; >> >> err = straighten_frag_list(headp); > > I don't think that we need to straighten the frag list in the > SKIP_ZERO_COPY case. It's basically just undoing what the for loop > that forms the frag list will do. __skb_linearize() will be able to > handle it in any case. > I can skip straightening and work on skb with frag-list, but I do not want to complicate the code. This case is pretty rare after the change in reassemble where complete skb is allocated on first set segment. >> +#ifdef SKIP_ZERO_COPY >> + if (skb_shinfo(head)->frag_list) { >> + err = __skb_linearize(head); >> + return err; >> + } >> +#endif > > Maybe move this to try_to_segment()? It seems like it is a little more > consistent. > But it is not same. I think we only need to linearize of there is frag-list, AFAIK non linear data in skb_shinfo can be safely handled in skb-segment. >> static int segment_skb(struct sk_buff **headp, bool csum_partial, >>bool ipv4, bool tcp, int l4_offset) >> { >> +#ifndef SKIP_ZERO_COPY >> int err; >> >> err = coalesce_skb(headp); >> @@ -543,6 +575,7 @@ static int segment_skb(struct sk_buff **headp, bool >> csum_partial, >> if (skb_shinfo(*headp)->frag_list) >> return __try_to_segment(*headp, csum_partial, >> ipv4, tcp, l4_offset); >> +#endif >> return 0; >> } > > Is this OK? It will retain a skb with a frag_list on the transmit path > where this wasn't possible before. This used to cause kernel panics > since the skb geometry wasn't even when the packet went to be > segmented. I don't remember if this is still the case but if not then > it seems like we should be able to simply the code regardless of this > patch. > right, I missed it. I will keep the segmentation here. >> @@ -1093,6 +1141,15 @@ static struct sk_buff *reassemble(struct sk_buff *skb) >> >> frag = lookup_frag(dev_net(skb->dev), stt_percpu, , hash); >> if (!frag->skbs) { >> + int err; >> + >> + err = pskb_expand_head(skb, skb_headroom(skb), >> + skb->data_len + tot_len, GFP_ATOMIC); >> + if (likely(!err)) { >> + if (unlikely(!__pskb_pull_tail(skb, skb->data_len))) >> + BUG(); >> + } > > This linearizes the packet even in non-SKIP_ZERO_COPY cases. I guess > we probably don't want to do that. It's also possible that the first > skb received isn't necessarily the first packet in the reassembled > packet. > ok, I will fix it. > This is effectively optimizing for the case where all packets are > large and will generate a frag list after reassembly. However, it's > possible in many situations that packets can be reassembled with zero > copy that doesn't later need to be split (maybe the total packet is > around 20k). Do you know how the performance compares vs. just > linearizing at the end in that case? Is there a benefit to doing an > early copy? > We are not trying to coalesce skb in case of SKIP_ZERO_COPY, so any partial skb segment will generate multiple inner segments. I have seen better performance with this approach rather than linearizing at the end. >> @@ -1154,8 +1216,10 @@ static struct sk_buff *reassemble(struct sk_buff *skb) >> >> FRAG_CB(frag->skbs)->first.set_ecn_ce |= INET_ECN_is_ce(iph->tos); >> FRAG_CB(frag->skbs)->first.rcvd_len += skb->len; >> - FRAG_CB(frag->skbs)->first.mem_used += skb->truesize; >> - stt_percpu->frag_mem_used += skb->truesize; >> + if (!copied_skb) { >> + FRAG_CB(frag->skbs)->first.mem_used += skb->truesize; >> + stt_percpu->frag_mem_used += skb->truesize; >> + } > > pskb_expand_head() doesn't actually change the truesize so our count > will be more inaccurate than before. However, we don't have to worry > about the attack case of very small packets consuming a large amount > of memory due to having many copies of struct sk_buff. ok. I will
Re: [ovs-dev] [PATCH] stt: linearize for CONFIG_SLUB case
On Thu, Mar 31, 2016 at 2:30 PM, Pravin B Shelarwrote: > STT implementation we saw performance improvements with linearizing > skb for SLUB case. So following patch skips zero copy operation > for such a case. > > Tested-By: Vasmi Abidi > Signed-off-by: Pravin B Shelar Can you add some performance numbers to the commit message? > diff --git a/datapath/linux/compat/stt.c b/datapath/linux/compat/stt.c > index eb397e8..ae33d5e 100644 > --- a/datapath/linux/compat/stt.c > +++ b/datapath/linux/compat/stt.c > static int coalesce_skb(struct sk_buff **headp) > { > - struct sk_buff *frag, *head, *prev; > + struct sk_buff *frag, *head; > +#ifndef SKIP_ZERO_COPY > + struct sk_buff *prev; > +#endif > int err; > > err = straighten_frag_list(headp); I don't think that we need to straighten the frag list in the SKIP_ZERO_COPY case. It's basically just undoing what the for loop that forms the frag list will do. __skb_linearize() will be able to handle it in any case. > +#ifdef SKIP_ZERO_COPY > + if (skb_shinfo(head)->frag_list) { > + err = __skb_linearize(head); > + return err; > + } > +#endif Maybe move this to try_to_segment()? It seems like it is a little more consistent. > static int segment_skb(struct sk_buff **headp, bool csum_partial, >bool ipv4, bool tcp, int l4_offset) > { > +#ifndef SKIP_ZERO_COPY > int err; > > err = coalesce_skb(headp); > @@ -543,6 +575,7 @@ static int segment_skb(struct sk_buff **headp, bool > csum_partial, > if (skb_shinfo(*headp)->frag_list) > return __try_to_segment(*headp, csum_partial, > ipv4, tcp, l4_offset); > +#endif > return 0; > } Is this OK? It will retain a skb with a frag_list on the transmit path where this wasn't possible before. This used to cause kernel panics since the skb geometry wasn't even when the packet went to be segmented. I don't remember if this is still the case but if not then it seems like we should be able to simply the code regardless of this patch. > @@ -1093,6 +1141,15 @@ static struct sk_buff *reassemble(struct sk_buff *skb) > > frag = lookup_frag(dev_net(skb->dev), stt_percpu, , hash); > if (!frag->skbs) { > + int err; > + > + err = pskb_expand_head(skb, skb_headroom(skb), > + skb->data_len + tot_len, GFP_ATOMIC); > + if (likely(!err)) { > + if (unlikely(!__pskb_pull_tail(skb, skb->data_len))) > + BUG(); > + } This linearizes the packet even in non-SKIP_ZERO_COPY cases. I guess we probably don't want to do that. It's also possible that the first skb received isn't necessarily the first packet in the reassembled packet. This is effectively optimizing for the case where all packets are large and will generate a frag list after reassembly. However, it's possible in many situations that packets can be reassembled with zero copy that doesn't later need to be split (maybe the total packet is around 20k). Do you know how the performance compares vs. just linearizing at the end in that case? Is there a benefit to doing an early copy? > @@ -1154,8 +1216,10 @@ static struct sk_buff *reassemble(struct sk_buff *skb) > > FRAG_CB(frag->skbs)->first.set_ecn_ce |= INET_ECN_is_ce(iph->tos); > FRAG_CB(frag->skbs)->first.rcvd_len += skb->len; > - FRAG_CB(frag->skbs)->first.mem_used += skb->truesize; > - stt_percpu->frag_mem_used += skb->truesize; > + if (!copied_skb) { > + FRAG_CB(frag->skbs)->first.mem_used += skb->truesize; > + stt_percpu->frag_mem_used += skb->truesize; > + } pskb_expand_head() doesn't actually change the truesize so our count will be more inaccurate than before. However, we don't have to worry about the attack case of very small packets consuming a large amount of memory due to having many copies of struct sk_buff. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH] stt: linearize for CONFIG_SLUB case
STT implementation we saw performance improvements with linearizing skb for SLUB case. So following patch skips zero copy operation for such a case. Tested-By: Vasmi AbidiSigned-off-by: Pravin B Shelar --- datapath/linux/compat/stt.c | 77 + 1 file changed, 71 insertions(+), 6 deletions(-) diff --git a/datapath/linux/compat/stt.c b/datapath/linux/compat/stt.c index eb397e8..ae33d5e 100644 --- a/datapath/linux/compat/stt.c +++ b/datapath/linux/compat/stt.c @@ -49,6 +49,15 @@ #define STT_DST_PORT 7471 #ifdef OVS_STT +#ifdef CONFIG_SLUB +/* + * We saw better performance with skipping zero copy in case of SLUB. + * So skip zero copy for SLUB case. + */ +#define SKIP_ZERO_COPY +#endif + + #define STT_VER 0 /* @list: Per-net list of STT ports. @@ -286,6 +295,7 @@ static int straighten_frag_list(struct sk_buff **skbp) return 0; } +#ifndef SKIP_ZERO_COPY static void copy_skb_metadata(struct sk_buff *to, struct sk_buff *from) { to->protocol = from->protocol; @@ -464,10 +474,14 @@ static int skb_list_segment(struct sk_buff *head, bool ipv4, int l4_offset) update_headers(head, true, l4_offset, hdr_len, ipv4, 0); return 0; } +#endif static int coalesce_skb(struct sk_buff **headp) { - struct sk_buff *frag, *head, *prev; + struct sk_buff *frag, *head; +#ifndef SKIP_ZERO_COPY + struct sk_buff *prev; +#endif int err; err = straighten_frag_list(headp); @@ -475,6 +489,7 @@ static int coalesce_skb(struct sk_buff **headp) return err; head = *headp; +#ifndef SKIP_ZERO_COPY /* Coalesce frag list. */ prev = head; for (frag = head->next; frag; frag = frag->next) { @@ -500,6 +515,7 @@ static int coalesce_skb(struct sk_buff **headp) if (!head->next) return 0; +#endif for (frag = head->next; frag; frag = frag->next) { head->len += frag->len; head->data_len += frag->len; @@ -508,9 +524,16 @@ static int coalesce_skb(struct sk_buff **headp) skb_shinfo(head)->frag_list = head->next; head->next = NULL; +#ifdef SKIP_ZERO_COPY + if (skb_shinfo(head)->frag_list) { + err = __skb_linearize(head); + return err; + } +#endif return 0; } +#ifndef SKIP_ZERO_COPY static int __try_to_segment(struct sk_buff *skb, bool csum_partial, bool ipv4, bool tcp, int l4_offset) { @@ -519,9 +542,16 @@ static int __try_to_segment(struct sk_buff *skb, bool csum_partial, else return skb_linearize(skb); } +#endif static int try_to_segment(struct sk_buff *skb) { +#ifdef SKIP_ZERO_COPY + /* coalesce_skb() since does not generate frag-list no need to +* linearize it here. +*/ + return 0; +#else struct stthdr *stth = stt_hdr(skb); bool csum_partial = !!(stth->flags & STT_CSUM_PARTIAL); bool ipv4 = !!(stth->flags & STT_PROTO_IPV4); @@ -529,11 +559,13 @@ static int try_to_segment(struct sk_buff *skb) int l4_offset = stth->l4_offset; return __try_to_segment(skb, csum_partial, ipv4, tcp, l4_offset); +#endif } static int segment_skb(struct sk_buff **headp, bool csum_partial, bool ipv4, bool tcp, int l4_offset) { +#ifndef SKIP_ZERO_COPY int err; err = coalesce_skb(headp); @@ -543,6 +575,7 @@ static int segment_skb(struct sk_buff **headp, bool csum_partial, if (skb_shinfo(*headp)->frag_list) return __try_to_segment(*headp, csum_partial, ipv4, tcp, l4_offset); +#endif return 0; } @@ -1054,13 +1087,28 @@ static struct pkt_frag *lookup_frag(struct net *net, return victim_frag; } +#ifdef SKIP_ZERO_COPY +static int __copy_skb(struct sk_buff *to, struct sk_buff *from) +{ + if (to->next) + return -EINVAL; + + if (unlikely(to->data_len || (from->len > skb_tailroom(to + return -EINVAL; + skb_copy_bits(from, 0, skb_put(to, from->len), from->len); + return 0; +} +#else +#define __copy_skb(a, b) -EINVAL; +#endif + static struct sk_buff *reassemble(struct sk_buff *skb) { struct iphdr *iph = ip_hdr(skb); struct tcphdr *tcph = tcp_hdr(skb); u32 seq = ntohl(tcph->seq); struct stt_percpu *stt_percpu; - struct sk_buff *last_skb; + struct sk_buff *last_skb, *copied_skb = NULL; struct pkt_frag *frag; struct pkt_key key; int tot_len; @@ -1093,6 +1141,15 @@ static struct sk_buff *reassemble(struct sk_buff *skb) frag = lookup_frag(dev_net(skb->dev), stt_percpu, , hash); if (!frag->skbs) { + int err; + + err = pskb_expand_head(skb, skb_headroom(skb), + skb->data_len + tot_len,