Hi Ben, Replies inline:
Regards & thanks Anju -----Original Message----- From: Ben Pfaff [mailto:b...@ovn.org] Sent: Thursday, May 10, 2018 1:59 AM To: Anju Thomas <anju.tho...@ericsson.com> Cc: d...@openvswitch.org Subject: Re: [ovs-dev] [PATCH v1] Fix crash due to multiple tnl push action On Tue, May 08, 2018 at 12:34:54AM +0530, Anju Thomas wrote: > During slow path packet processing, if the action is to output to a > tunnel port, the slow path processing of the encapsulated packet > continues on the underlay bridge and additional actions (e.g. optional > VLAN encapsulation, bond link selection and finally output to port) > are collected there. > > To prepare for a continuation of the processing of the original packet > (e.g. output to other tunnel ports in a flooding scenario), the > “tunnel_push” action and the actions of the underlay bridge are > encapsulated in a clone() action to preserve the original packet. > > If the underlay bridge decides to drop the tunnel packet (for example > if both bonded ports are down simultaneously), the clone(tunnel_push)) > actions previously generated as part of translation of the output to > tunnel port are discarded and a stand-alone tunnel_push action is > added instead. Thus the tunnel header is pushed on to the original packet. > This is the bug. > > Consequences: If packet processing continues with sending to further > tunnel ports, multiple tunnel header pushes will happen on the > original packet as typically the tunnels all traverse the same > underlay bond which is down. The packet may not have enough headroom > to accommodate all the tunnel headers. OVS crashes if it runs out of > space while trying to push the tunnel headers. > > Even in case there is enough headroom, the packet will not be freed > since the accumulated action list contains only the tunnel header push > action without any output port action. Thus, we either have a crash or > a packet buffer leak. > > Signed-off-by: Anju Thomas <anju.tho...@ericsson.com> Thanks for the patch. It applies OK and all the tests pass. It looks like your commit message describes at least two other bugs in OVS, though. First, if OVS crashes when it pushes tunnel headers, even if there's not enough headroom, that's really bad. At worst, it should drop the packet. Do you know where the crash occurs? We should fix the problem, since it might recur in some other context. [Anju] OVS was actually crashing in the dpdk datapath. The crash is a manual assert in our case. The rootcause is that dp receives the actions after the upcall (say with >=3 tunnel pushes ) . Now as part of action processing , since it is a tunnel push action , we try to find space in the dpdk mbuf packet headroom (which Is 128 bytes). By the time we try to process the third tunnel push , there is no headroom left since each tunnel header is of 50 bytes (50 *3 > 128 bytes headroom). Hence it manually asserts . This assert is to catch any unexpected code flow . Do you think that in this case we should still go ahead and prevent the crash ? The back trace was as below:- (gdb) bt #0 0x00007ffa9a856c37 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56 #1 0x00007ffa9a85a028 in __GI_abort () at abort.c:89 #2 0x00000000005fc725 in dp_packet_resize__ (b=b@entry=0x7ffa87744680, new_headroom=new_headroom@entry=64, new_tailroom=<optimized out>) at lib/dp-packet.c:258 #3 0x00000000005fcb1f in dp_packet_prealloc_headroom (b=b@entry=0x7ffa87744680, size=size@entry=50) at lib/dp-packet.c:288 #4 0x00000000005fcf91 in dp_packet_push_uninit (b=b@entry=0x7ffa87744680, size=size@entry=50) at lib/dp-packet.c:400 #5 0x000000000069466c in netdev_tnl_push_ip_header (packet=packet@entry=0x7ffa87744680, header=0x7ff85401bab0, size=50, ip_tot_size=ip_tot_size@entry=0x7ff8117f89fc) at lib/netdev-native-tnl.c:152 #6 0x000000000069472a in netdev_tnl_push_udp_header (packet=0x7ffa87744680, data=<optimized out>) at lib/netdev-native-tnl.c:215 #7 0x0000000000625627 in netdev_push_header (netdev=0x3ca3990, batch=batch@entry=0x7ff8117f9498, data=data@entry=0x7ff85401baa0) at lib/netdev.c:874 #8 0x00000000006069f2 in push_tnl_action (batch=0x7ff8117f9498, attr=0x7ff85401ba9c, pmd=0x33efb30) at lib/dpif-netdev.c:4629 #9 dp_execute_cb (aux_=aux_@entry=0x7ff8117f9840, packets_=packets_@entry=0x7ff8117f9498, a=a@entry=0x7ff85401ba9c, may_steal=false) static void dp_packet_resize__(struct dp_packet *b, size_t new_headroom, size_t new_tailroom) { void *new_base, *new_data; size_t new_allocated; new_allocated = new_headroom + dp_packet_size(b) + new_tailroom; switch (b->source) { case DPBUF_DPDK: OVS_NOT_REACHED();-------------------------------------------------------------------------------------------------->crashes here Second, it's a little shocking to hear that an encapsulation action without a following output action causes a memory leak. We also need to fix that. Do you have any more details? [Anju] Now as explained above, the crash happens because we run out of headroom. But in case we have say 2 or less than 2 tunnel pushes we will have a mem leak as packet is never freed because the tnl push is the dp last action and there is no other output action or any other action like recirc that may translate to output action in the end leading to packet buffer not being freed. Are you proposing that we have some sort of preventive fix in the dp to handle an incorrect action list from the upcall handling? Thanks, Ben. _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev