Hi Ben, I was working on the code changes and I can think of two approaches we can take to prevent this crash in the dpdk datapath.
1. Today the dp_packet module that we have is never return any error . The only error handling it is by assert using OVS_NOT_REACHED(). Should we go ahead and restructure the entire dp_packet module to do some error handling and return (refer to openvswitchlib/dp-packet.c). This would require a more detailed effort to ensure all flows are handled. 2. Do not call the dp_packet_uninit without properly checking we have adequate headroom for DPBUF_DPDK buffers. Any suggestions on which approach we should go ahead with ? Regards & Thanks Anju -----Original Message----- From: Ben Pfaff [mailto:b...@ovn.org] Sent: Friday, May 11, 2018 2:00 AM To: Anju Thomas <anju.tho...@ericsson.com> Cc: d...@openvswitch.org Subject: Re: [ovs-dev] [PATCH v1] Fix crash due to multiple tnl push action On Thu, May 10, 2018 at 08:51:03AM +0000, Anju Thomas wrote: > It looks like your commit message describes at least two other bugs in > OVS, though. First, if OVS crashes when it pushes tunnel headers, > even if there's not enough headroom, that's really bad. At worst, it > should drop the packet. Do you know where the crash occurs? We > should fix the problem, since it might recur in some other context. > > [Anju] OVS was actually crashing in the dpdk datapath. The crash is a manual > assert in our case. > The rootcause is that dp receives the actions after the upcall (say with >=3 > tunnel pushes ) . Now as part of action processing , since it is a tunnel > push action , we try to find space in the dpdk mbuf packet headroom (which Is > 128 bytes). By the time we try to process the third tunnel push , there is no > headroom left since each tunnel header is of 50 bytes (50 *3 > 128 bytes > headroom). Hence it manually asserts . This assert is to catch any > unexpected code flow . Do you think that in this case we should still go > ahead and prevent the crash ? I don't understand why it's OK to crash in this case. Why do you think so? > Second, it's a little shocking to hear that an encapsulation action without a > following output action causes a memory leak. We also need to fix that. Do > you have any more details? > [Anju] Now as explained above, the crash happens because we run out of > headroom. But in case we have say 2 or less than 2 tunnel pushes we will have > a mem leak as packet is never freed because the tnl push is the dp last > action and there is no other output action or any other action like recirc > that may translate to output action in the end leading to packet buffer not > being freed. > Are you proposing that we have some sort of preventive fix in the dp to > handle an incorrect action list from the upcall handling? Yes. It's unacceptable to leak memory because there's a packet modification without an output action. The kernel datapath never does this, for example. _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev