王亮 Liang Wang <wanglian...@didiglobal.com> writes: > Hi, Conole. Thank you for pointing out the mistakes in my last patch. Yes, I > did missed some something important. > The correct and complete way to fix this problem is like the following patch. > In fact, we have found and fixed this > crash problem one year ago; and this patch has worked very well in our > production environment for 11 months. > > From 7e13a39cdbdfb484d4cb6b074f08471dc16d22b7 Mon Sep 17 00:00:00 2001 > From: Wang Liang <wanglian...@didiglobal.com> > Date: Thu, 15 Apr 2021 13:52:01 +0800 > Subject: [PATCH] Fix userspace datapath crash caused by IP fragments > > The ovs userspace datapath will crash on openflow packet-out message > with IP packet largger than MTU. This patch will fix the problem. > When pre-processing the IP fragments, clone the packet if it has > 'dnsteal' flag set. > > Signed-off-by: Wang Liang <wanglian...@didiglobal.com> > --- > lib/ipf.c | 6 ++---- > 1 file changed, 2 insertions(+), 4 deletions(-) > > diff --git a/lib/ipf.c b/lib/ipf.c > index c20bcc0b3..176140afb 100644 > --- a/lib/ipf.c > +++ b/lib/ipf.c > @@ -811,7 +811,7 @@ ipf_process_frag(struct ipf *ipf, struct ipf_list > *ipf_list, > * recommend not setting the mempool number of buffers too low > * and also clamp the number of fragments. */ > struct ipf_frag *frag = &ipf_list->frag_list[last_inuse_idx + 1]; > - frag->pkt = pkt; > + frag->pkt = dnsteal ? dp_packet_clone(pkt) : pkt; > frag->start_data_byte = start_data_byte; > frag->end_data_byte = end_data_byte; > frag->dnsteal = dnsteal; > @@ -1338,9 +1338,7 @@ ipf_destroy(struct ipf *ipf) > while (ipf_list->last_sent_idx < ipf_list->last_inuse_idx) { > struct dp_packet *pkt > = ipf_list->frag_list[ipf_list->last_sent_idx + 1].pkt; > - if (!ipf_list->frag_list[ipf_list->last_sent_idx + 1].dnsteal) { > - dp_packet_delete(pkt); > - } > + dp_packet_delete(pkt);
This looks better. There are no more uses of (struct ipf_frag *)->dnsteal - so I think we should remove it from the struct as well. I know you wrote that you've been running this in production for a year, and I think the change looks correct, but is it possible to have a test case (maybe even a pcap that we can extract) to go along with it? > atomic_count_dec(&ipf->nfrag); > ipf_list->last_sent_idx++; > } _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev