Hi Ilya,
I had a look at the patch that you have submitted.
In push_tnl_action, I can see that with the patch you are trying to return 
error  in case  we don’t find the tunnel port or return the error that 
netdev_push header returns.  And we are deciding to delete the batch in the 
calling function dp_execute_cb in case of error.  But netdev_push_header always 
returns 0.  Actually the following code in push_tnl_action is dead code.

 
    if (!err) {
        return 0;
    }

The actual issue is that the push_header function always return void . So any 
error happening during push_header is never captured or acted upon.

I think the fix also be should be to add/capture error scenarios during tnl 
header push as well.

Regards & Thanks
Anju

-----Original Message-----
From: Ilya Maximets [mailto:i.maxim...@samsung.com] 
Sent: Monday, May 14, 2018 6:28 PM
To: ovs-dev@openvswitch.org; Anju Thomas <anju.tho...@ericsson.com>; Ben Pfaff 
<b...@ovn.org>
Cc: Tiago Lam <tiago....@intel.com>
Subject: Re: [ovs-dev] [PATCH v1] Fix crash due to multiple tnl push action

Hello Anju, Ben,

Looks like I fixed a leak reported here in my recent patch:
  https://mail.openvswitch.org/pipermail/ovs-dev/2018-May/346892.html

Could you, please, take a look at it?

I actually managed to reproduce the packet leak on tunnel-push-pop unit tests 
with valgrind:

==6445== 2,912 (2,208 direct, 704 indirect) bytes in 4 blocks are definitely 
lost in loss record 400 of 410
==6445==    at 0x4C2DB8F: malloc (in 
/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==6445==    by 0x516314: xmalloc (util.c:120)
==6445==    by 0x466154: dp_packet_new (dp-packet.c:138)
==6445==    by 0x466154: dp_packet_new_with_headroom (dp-packet.c:148)
==6445==    by 0x4F6CFE: eth_from_hex (packets.c:498)
==6445==    by 0x48EB43: eth_from_packet (netdev-dummy.c:1450)
==6445==    by 0x48EB43: netdev_dummy_receive (netdev-dummy.c:1562)
==6445==    by 0x515980: process_command (unixctl.c:313)
==6445==    by 0x515980: run_connection (unixctl.c:347)
==6445==    by 0x515980: unixctl_server_run (unixctl.c:398)
==6445==    by 0x406F1E: main (ovs-vswitchd.c:120)

Above patch fixes it.

Best regards, Ilya Maximets.

> Hi Ben,
> 
> I will work on these changes as well.
> 
> Regards
> Anju
> 
> -----Original Message-----
> From: Ben Pfaff [mailto:blp at ovn.org]
> Sent: Friday, May 11, 2018 2:00 AM
> To: Anju Thomas <anju.thomas at ericsson.com>
> Cc: dev at openvswitch.org
> Subject: Re: [ovs-dev] [PATCH v1] Fix crash due to multiple tnl push 
> action
> 
> On Thu, May 10, 2018 at 08:51:03AM +0000, Anju Thomas wrote:
>> It looks like your commit message describes at least two other bugs 
>> in OVS, though.  First, if OVS crashes when it pushes tunnel headers, 
>> even if there's not enough headroom, that's really bad.  At worst, it 
>> should drop the packet.  Do you know where the crash occurs?  We 
>> should fix the problem, since it might recur in some other context.
>> 
>> [Anju] OVS was actually crashing in the dpdk datapath. The crash is a manual 
>> assert in our case.
>> The rootcause is that dp receives the actions after the upcall (say with >=3 
>> tunnel pushes ) . Now as part of action processing , since it is a tunnel 
>> push action , we try to find space in the dpdk mbuf packet headroom (which 
>> Is 128 bytes). By the time we try to process the third tunnel push , there 
>> is no headroom left since each tunnel header is of 50 bytes (50 *3 > 128 
>> bytes headroom). Hence it manually asserts .  This assert is to catch any 
>> unexpected code flow . Do you think that in this case we should still go 
>> ahead and  prevent the crash ? 
> 
> I don't understand why it's OK to crash in this case.  Why do you think so?
> 
>> Second, it's a little shocking to hear that an encapsulation action without 
>> a following output action causes a memory leak.  We also need to fix that.  
>> Do you have any more details?
>> [Anju] Now as explained above, the crash happens because we run out of 
>> headroom. But in case we have say 2 or less than 2 tunnel pushes we will 
>> have a mem leak as packet is never freed because the tnl push is the dp last 
>> action and there is no other output action or any other action like recirc 
>> that may translate to output action in the end leading  to packet buffer not 
>> being freed.
>> Are you proposing that we have some sort of preventive fix in the dp to 
>> handle an incorrect action list from the upcall handling? 
> 
> Yes.  It's unacceptable to leak memory because there's a packet modification 
> without an output action.  The kernel datapath never does this, for example.

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to