Re: [ovs-dev] [PATCH v1] Fix crash due to multiple tnl push action

Ilya Maximets Tue, 15 May 2018 01:35:03 -0700

Thanks for review.

Yes, you're right that currently 'netdev_push_header' always returns 0.
I think it happened while introducing 'netdev_has_tunnel_push_pop()'. But
even before that, the error code was only for the ENOTSUPP and the only
place, where 'netdev_push_header' actually could fail is 'OVS_NOT_REACHED()'
inside 'dp_packet_resize__()'. And it will abort there.


So, the only way to make a proper error handling is to allow at least
'dp_packet_resize__()' to fail and return proper error code. But this will
require, I guess, a big refactoring of the whole dp_packet module with
it's dependencies.

But, yes, I still think that things like 'OVS_NOT_REACHED()' for the
reachable code paths is a really bad thing. And we need to fix this.

About my patch:
The main idea was just to 'break' instead of 'return' in case where
'may_steal == true'. This triggers the 'dp_packet_batch_free()' placed
at the end of the function and prevents the packet leak.
There are no changes in error handling code, only refactoring to prevent
double free. Maybe I should make it more clear in a first non-RFC version.
Maybe something like this:
--------------------------------------------------------------------------
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index f86ed2a..9476a03 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -5672,12 +5672,23 @@ dp_execute_cb(void *aux_, struct dp_packet_batch 
*packets_,
         break;
 
     case OVS_ACTION_ATTR_TUNNEL_PUSH:
-        if (*depth < MAX_RECIRC_DEPTH) {
-            dp_packet_batch_apply_cutlen(packets_);
-            push_tnl_action(pmd, a, packets_);
-            return;
+        /*
+         * XXX: 'may_steal' concept is broken here, because we're
+         *      unconditionally changing the packets just like for other PUSH_*
+         *      actions in 'odp_execute()'. 'false' value could be ignored,
+         *      because we could reach here only after clone, but we still need
+         *      to free the packets in case 'may_steal == true'.
+         */
+        if (may_steal) {
+            /* We're requested to push tunnel header, but also we need to take
+             * the ownership of these packets. So, we may not execute the
+             * action because the caller will not use the result anyway.
+             * Just break to free the batch. */
+            break;
         }
-        break;
+        dp_packet_batch_apply_cutlen(packets_);
+        push_tnl_action(pmd, a, packets_);
+        return;
 
     case OVS_ACTION_ATTR_TUNNEL_POP:
         if (*depth < MAX_RECIRC_DEPTH) {
--------------------------------------------------------------------------

Best regards, Ilya Maximets.

On 15.05.2018 09:43, Anju Thomas wrote:
> Hi Ilya,
> I had a look at the patch that you have submitted.
> In push_tnl_action, I can see that with the patch you are trying to return 
> error  in case  we don’t find the tunnel port or return the error that 
> netdev_push header returns.  And we are deciding to delete the batch in the 
> calling function dp_execute_cb in case of error.  But netdev_push_header 
> always returns 0.  Actually the following code in push_tnl_action is dead 
> code.
> 
>  
>     if (!err) {
>         return 0;
>     }
> 
> The actual issue is that the push_header function always return void . So any 
> error happening during push_header is never captured or acted upon.
> 
> I think the fix also be should be to add/capture error scenarios during tnl 
> header push as well.
> 
> Regards & Thanks
> Anju
> 
> -----Original Message-----
> From: Ilya Maximets [mailto:i.maxim...@samsung.com] 
> Sent: Monday, May 14, 2018 6:28 PM
> To: ovs-dev@openvswitch.org; Anju Thomas <anju.tho...@ericsson.com>; Ben 
> Pfaff <b...@ovn.org>
> Cc: Tiago Lam <tiago....@intel.com>
> Subject: Re: [ovs-dev] [PATCH v1] Fix crash due to multiple tnl push action
> 
> Hello Anju, Ben,
> 
> Looks like I fixed a leak reported here in my recent patch:
>   https://mail.openvswitch.org/pipermail/ovs-dev/2018-May/346892.html
> 
> Could you, please, take a look at it?
> 
> I actually managed to reproduce the packet leak on tunnel-push-pop unit tests 
> with valgrind:
> 
> ==6445== 2,912 (2,208 direct, 704 indirect) bytes in 4 blocks are definitely 
> lost in loss record 400 of 410
> ==6445==    at 0x4C2DB8F: malloc (in 
> /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
> ==6445==    by 0x516314: xmalloc (util.c:120)
> ==6445==    by 0x466154: dp_packet_new (dp-packet.c:138)
> ==6445==    by 0x466154: dp_packet_new_with_headroom (dp-packet.c:148)
> ==6445==    by 0x4F6CFE: eth_from_hex (packets.c:498)
> ==6445==    by 0x48EB43: eth_from_packet (netdev-dummy.c:1450)
> ==6445==    by 0x48EB43: netdev_dummy_receive (netdev-dummy.c:1562)
> ==6445==    by 0x515980: process_command (unixctl.c:313)
> ==6445==    by 0x515980: run_connection (unixctl.c:347)
> ==6445==    by 0x515980: unixctl_server_run (unixctl.c:398)
> ==6445==    by 0x406F1E: main (ovs-vswitchd.c:120)
> 
> Above patch fixes it.
> 
> Best regards, Ilya Maximets.
> 
>> Hi Ben,
>>
>> I will work on these changes as well.
>>
>> Regards
>> Anju
>>
>> -----Original Message-----
>> From: Ben Pfaff [mailto:blp at ovn.org]
>> Sent: Friday, May 11, 2018 2:00 AM
>> To: Anju Thomas <anju.thomas at ericsson.com>
>> Cc: dev at openvswitch.org
>> Subject: Re: [ovs-dev] [PATCH v1] Fix crash due to multiple tnl push 
>> action
>>
>> On Thu, May 10, 2018 at 08:51:03AM +0000, Anju Thomas wrote:
>>> It looks like your commit message describes at least two other bugs 
>>> in OVS, though.  First, if OVS crashes when it pushes tunnel headers, 
>>> even if there's not enough headroom, that's really bad.  At worst, it 
>>> should drop the packet.  Do you know where the crash occurs?  We 
>>> should fix the problem, since it might recur in some other context.
>>>
>>> [Anju] OVS was actually crashing in the dpdk datapath. The crash is a 
>>> manual assert in our case.
>>> The rootcause is that dp receives the actions after the upcall (say with 
>>> >=3 tunnel pushes ) . Now as part of action processing , since it is a 
>>> tunnel push action , we try to find space in the dpdk mbuf packet headroom 
>>> (which Is 128 bytes). By the time we try to process the third tunnel push , 
>>> there is no headroom left since each tunnel header is of 50 bytes (50 *3 > 
>>> 128 bytes headroom). Hence it manually asserts .  This assert is to catch 
>>> any unexpected code flow . Do you think that in this case we should still 
>>> go ahead and  prevent the crash ? 
>>
>> I don't understand why it's OK to crash in this case.  Why do you think so?
>>
>>> Second, it's a little shocking to hear that an encapsulation action without 
>>> a following output action causes a memory leak.  We also need to fix that.  
>>> Do you have any more details?
>>> [Anju] Now as explained above, the crash happens because we run out of 
>>> headroom. But in case we have say 2 or less than 2 tunnel pushes we will 
>>> have a mem leak as packet is never freed because the tnl push is the dp 
>>> last action and there is no other output action or any other action like 
>>> recirc that may translate to output action in the end leading  to packet 
>>> buffer not being freed.
>>> Are you proposing that we have some sort of preventive fix in the dp to 
>>> handle an incorrect action list from the upcall handling? 
>>
>> Yes.  It's unacceptable to leak memory because there's a packet modification 
>> without an output action.  The kernel datapath never does this, for example.
> 
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v1] Fix crash due to multiple tnl push action

Reply via email to