Hi Ben,

We are facing multiple such crashes on different computes in our deployments. 
Seems to be a pretty common problem in our setup.  As you suggested, it would 
be good if we can make the below changes as well.    How do you suggest we move 
forward regarding the approaches?

Regards & thanks
Anju

-----Original Message-----
From: Anju Thomas 
Sent: Tuesday, June 05, 2018 10:26 AM
To: Ben Pfaff <b...@ovn.org>
Cc: d...@openvswitch.org
Subject: RE: [ovs-dev] [PATCH v1] Fix crash due to multiple tnl push action


Hi ,

Any suggestions ?

Will it be ok if  we merge the tunnel push change that we have discussed below 
while I can work on the other changes in parallel?

Regards & thanks
Anju
-----Original Message-----
From: Anju Thomas
Sent: Thursday, May 17, 2018 3:52 PM
To: 'Ben Pfaff' <b...@ovn.org>
Cc: d...@openvswitch.org
Subject: RE: [ovs-dev] [PATCH v1] Fix crash due to multiple tnl push action

Hi Ben,
I was working on the code changes and I can think of two approaches we can take 
to prevent this crash in the dpdk datapath.

1. Today the dp_packet module that we have is never return any error . The only 
error handling it is by assert using OVS_NOT_REACHED(). Should we go ahead and 
restructure the entire dp_packet module to do some error handling and  return 
(refer to openvswitchlib/dp-packet.c). This would require a more detailed 
effort to ensure all flows are handled.
2.  Do not call the  dp_packet_uninit without properly checking we have 
adequate headroom for DPBUF_DPDK buffers. 

Any suggestions on which approach we should go ahead with  ?

Regards & Thanks
Anju
-----Original Message-----
From: Ben Pfaff [mailto:b...@ovn.org]
Sent: Friday, May 11, 2018 2:00 AM
To: Anju Thomas <anju.tho...@ericsson.com>
Cc: d...@openvswitch.org
Subject: Re: [ovs-dev] [PATCH v1] Fix crash due to multiple tnl push action

On Thu, May 10, 2018 at 08:51:03AM +0000, Anju Thomas wrote:
> It looks like your commit message describes at least two other bugs in 
> OVS, though.  First, if OVS crashes when it pushes tunnel headers, 
> even if there's not enough headroom, that's really bad.  At worst, it 
> should drop the packet.  Do you know where the crash occurs?  We 
> should fix the problem, since it might recur in some other context.
> 
> [Anju] OVS was actually crashing in the dpdk datapath. The crash is a manual 
> assert in our case.
> The rootcause is that dp receives the actions after the upcall (say with >=3 
> tunnel pushes ) . Now as part of action processing , since it is a tunnel 
> push action , we try to find space in the dpdk mbuf packet headroom (which Is 
> 128 bytes). By the time we try to process the third tunnel push , there is no 
> headroom left since each tunnel header is of 50 bytes (50 *3 > 128 bytes 
> headroom). Hence it manually asserts .  This assert is to catch any 
> unexpected code flow . Do you think that in this case we should still go 
> ahead and  prevent the crash ? 

I don't understand why it's OK to crash in this case.  Why do you think so?

> Second, it's a little shocking to hear that an encapsulation action without a 
> following output action causes a memory leak.  We also need to fix that.  Do 
> you have any more details?
> [Anju] Now as explained above, the crash happens because we run out of 
> headroom. But in case we have say 2 or less than 2 tunnel pushes we will have 
> a mem leak as packet is never freed because the tnl push is the dp last 
> action and there is no other output action or any other action like recirc 
> that may translate to output action in the end leading  to packet buffer not 
> being freed.
> Are you proposing that we have some sort of preventive fix in the dp to 
> handle an incorrect action list from the upcall handling? 

Yes.  It's unacceptable to leak memory because there's a packet modification 
without an output action.  The kernel datapath never does this, for example.
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to