Re: [ovs-dev] [PATCH] ofproto: Fix crash on PACKET_OUT due to recursive locking after upcall.

2019-11-26 Thread Ilya Maximets
On 25.11.2019 22:45, Ben Pfaff wrote:
> On Fri, Nov 01, 2019 at 10:24:39PM +0100, Ilya Maximets wrote:
>> Handling of OpenFlow PACKET_OUT implies pushing the packet through
>> the datapath and packet processing inside the datapath could trigger
>> an upcall.  In case of system datapath, 'dpif_execute()' sends packet
>> to the kernel module and returns.  If any, upcall  will be triggered
>> inside the kernel and handled by a separate thread in userspace.
>> But in case of userspace datapath full processing of the packet happens
>> inside the 'dpif_execute()' in the same thread that handled PACKET_OUT.
>> This causes an issue if upcall will lead to modification of flow rules.
> 
> Thank you very much for the fix.  It is simpler and cleaner than I
> expected.  It looks correct to me and it passes all of the tests for me
> locally.
> 
> Acked-by: Ben Pfaff 

Thanks!  Applied to master and backported down to 2.7.

There was a big refactoring between branches 2.6 and 2.7, so this
patch could not be applied to 2.6.  However, IIUC, handle_packet_out()
on this branch doesn't take 'ofproto_mutex' at all.  Looks a bit dangerous.

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] ofproto: Fix crash on PACKET_OUT due to recursive locking after upcall.

2019-11-25 Thread Ben Pfaff
On Fri, Nov 01, 2019 at 10:24:39PM +0100, Ilya Maximets wrote:
> Handling of OpenFlow PACKET_OUT implies pushing the packet through
> the datapath and packet processing inside the datapath could trigger
> an upcall.  In case of system datapath, 'dpif_execute()' sends packet
> to the kernel module and returns.  If any, upcall  will be triggered
> inside the kernel and handled by a separate thread in userspace.
> But in case of userspace datapath full processing of the packet happens
> inside the 'dpif_execute()' in the same thread that handled PACKET_OUT.
> This causes an issue if upcall will lead to modification of flow rules.

Thank you very much for the fix.  It is simpler and cleaner than I
expected.  It looks correct to me and it passes all of the tests for me
locally.

Acked-by: Ben Pfaff 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] ofproto: Fix crash on PACKET_OUT due to recursive locking after upcall.

2019-11-21 Thread Anil Kumar Koli via dev
Hi Ben,

I was not able to reproduce this issue in ovs sandbox, but able to test this 
patch on local testbed which had this issue and the patch works.

Thanks & Regards,
Anil Kumar.

-Original Message-
From: Ilya Maximets  
Sent: Friday, 22 November, 2019 12:02 AM
To: Ilya Maximets ; ovs-dev@openvswitch.org; Ben Pfaff 

Cc: Anil Kumar Koli 
Subject: Re: [PATCH] ofproto: Fix crash on PACKET_OUT due to recursive locking 
after upcall.

On 01.11.2019 22:24, Ilya Maximets wrote:
> Handling of OpenFlow PACKET_OUT implies pushing the packet through the 
> datapath and packet processing inside the datapath could trigger an 
> upcall.  In case of system datapath, 'dpif_execute()' sends packet to 
> the kernel module and returns.  If any, upcall  will be triggered 
> inside the kernel and handled by a separate thread in userspace.
> But in case of userspace datapath full processing of the packet 
> happens inside the 'dpif_execute()' in the same thread that handled 
> PACKET_OUT.
> This causes an issue if upcall will lead to modification of flow rules.
> For example, it could happen while processing of 'learn' actions.
> Since whole handling of PACKET_OUT is protected by 'ofproto_mutex', 
> OVS will assert on attempt to take it recursively while processing 
> 'learn' actions:
> 
>0 __GI_raise (sig=sig@entry=6)
>1 __GI_abort ()
>2 ovs_abort_valist ()
>3 ovs_abort ()
>4 ovs_mutex_lock_at (where=where@entry=0xad4199 "ofproto/ofproto.c:5391")
> 
>5 ofproto_flow_mod_learn ()   at ofproto/ofproto.c:5391
> 
>6 xlate_learn_action ()   at ofproto/ofproto-dpif-xlate.c:5378
> <'learn' action found>
>7 do_xlate_actions () at ofproto/ofproto-dpif-xlate.c:6893
>8 xlate_recursively ()at ofproto/ofproto-dpif-xlate.c:4233
>9 xlate_table_action ()   at ofproto/ofproto-dpif-xlate.c:4361
>   10 in xlate_ofpact_resubmit () at ofproto/ofproto-dpif-xlate.c:4672
>   11 do_xlate_actions () at ofproto/ofproto-dpif-xlate.c:6773
>   12 xlate_actions ()at ofproto/ofproto-dpif-xlate.c:7570
>  
>   13 upcall_xlate () at ofproto/ofproto-dpif-upcall.c:1197
>   14 process_upcall ()   at ofproto/ofproto-dpif-upcall.c:1413
>   15 upcall_cb ()at ofproto/ofproto-dpif-upcall.c:1315
>   16 dp_netdev_upcall (DPIF_UC_MISS) at lib/dpif-netdev.c:6236
>  
>   17 handle_packet_upcall () at lib/dpif-netdev.c:6591
>   18 fast_path_processing () at lib/dpif-netdev.c:6709
>   19 dp_netdev_input__ ()at lib/dpif-netdev.c:6797
>   20 dp_netdev_recirculate ()at lib/dpif-netdev.c:6842
>   21 dp_execute_cb ()at lib/dpif-netdev.c:7158
>   22 odp_execute_actions ()  at lib/odp-execute.c:794
>   23 dp_netdev_execute_actions ()at lib/dpif-netdev.c:7332
>   24 dpif_netdev_execute ()  at lib/dpif-netdev.c:3725
>   25 dpif_netdev_operate ()  at lib/dpif-netdev.c:3756
>  
>   26 dpif_operate () at lib/dpif.c:1367
>   27 dpif_execute () at lib/dpif.c:1321
>   28 packet_execute ()   at ofproto/ofproto-dpif.c:4760
>   29 ofproto_packet_out_finish ()at ofproto/ofproto.c:3594
>  
>   30 handle_packet_out ()at ofproto/ofproto.c:3635
>   31 handle_single_part_openflow (OFPTYPE_PACKET_OUT) at 
> ofproto/ofproto.c:8400
>   32 handle_openflow ()   at 
> ofproto/ofproto.c:8587
>   33 ofconn_run ()
>   34 connmgr_run ()
>   35 ofproto_run ()
>   36 bridge_run__ ()
>   37 bridge_run ()
>   38 main ()
> 
> Fix that by splitting the 'ofproto_packet_out_finish()' in two parts.
> First one that translates side-effects and requires holding 'ofproto_mutex'
> and the second that only pushes packet to datapath.  The second part 
> moved out of 'ofproto_mutex' because 'ofproto_mutex' is not required 
> and actually should not be taken in order to avoid recursive locking.
> 
> Reported-by: Anil Kumar Koli 
> Reported-at: 
> https://mail.openvswitch.org/pipermail/ovs-discuss/2019-April/048494.h
> tml
> Signed-off-by: Ilya Maximets 
> ---
> 
> Previous discussion about implementation:
> https://mail.openvswitch.org/pipermail/ovs-dev/2019-August/362082.html
> 
> I'm not able to reproduce the original issue, so I can not test if 
> this patch fixes it.
> 
>  ofproto/ofproto-dpif.c | 40 --
>  ofproto/ofproto-provider.h | 12 +---
>  ofproto/ofproto.c  | 29 +++
>  3 files changed, 64 insertions(+), 17 deletions(-)


Hi Ben,

Could you, please, take a look at this patch?

Best regards, Ilya Maximets.

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] ofproto: Fix crash on PACKET_OUT due to recursive locking after upcall.

2019-11-21 Thread Ilya Maximets
On 01.11.2019 22:24, Ilya Maximets wrote:
> Handling of OpenFlow PACKET_OUT implies pushing the packet through
> the datapath and packet processing inside the datapath could trigger
> an upcall.  In case of system datapath, 'dpif_execute()' sends packet
> to the kernel module and returns.  If any, upcall  will be triggered
> inside the kernel and handled by a separate thread in userspace.
> But in case of userspace datapath full processing of the packet happens
> inside the 'dpif_execute()' in the same thread that handled PACKET_OUT.
> This causes an issue if upcall will lead to modification of flow rules.
> For example, it could happen while processing of 'learn' actions.
> Since whole handling of PACKET_OUT is protected by 'ofproto_mutex',
> OVS will assert on attempt to take it recursively while processing
> 'learn' actions:
> 
>0 __GI_raise (sig=sig@entry=6)
>1 __GI_abort ()
>2 ovs_abort_valist ()
>3 ovs_abort ()
>4 ovs_mutex_lock_at (where=where@entry=0xad4199 "ofproto/ofproto.c:5391")
> 
>5 ofproto_flow_mod_learn ()   at ofproto/ofproto.c:5391
> 
>6 xlate_learn_action ()   at ofproto/ofproto-dpif-xlate.c:5378
> <'learn' action found>
>7 do_xlate_actions () at ofproto/ofproto-dpif-xlate.c:6893
>8 xlate_recursively ()at ofproto/ofproto-dpif-xlate.c:4233
>9 xlate_table_action ()   at ofproto/ofproto-dpif-xlate.c:4361
>   10 in xlate_ofpact_resubmit () at ofproto/ofproto-dpif-xlate.c:4672
>   11 do_xlate_actions () at ofproto/ofproto-dpif-xlate.c:6773
>   12 xlate_actions ()at ofproto/ofproto-dpif-xlate.c:7570
>  
>   13 upcall_xlate () at ofproto/ofproto-dpif-upcall.c:1197
>   14 process_upcall ()   at ofproto/ofproto-dpif-upcall.c:1413
>   15 upcall_cb ()at ofproto/ofproto-dpif-upcall.c:1315
>   16 dp_netdev_upcall (DPIF_UC_MISS) at lib/dpif-netdev.c:6236
>  
>   17 handle_packet_upcall () at lib/dpif-netdev.c:6591
>   18 fast_path_processing () at lib/dpif-netdev.c:6709
>   19 dp_netdev_input__ ()at lib/dpif-netdev.c:6797
>   20 dp_netdev_recirculate ()at lib/dpif-netdev.c:6842
>   21 dp_execute_cb ()at lib/dpif-netdev.c:7158
>   22 odp_execute_actions ()  at lib/odp-execute.c:794
>   23 dp_netdev_execute_actions ()at lib/dpif-netdev.c:7332
>   24 dpif_netdev_execute ()  at lib/dpif-netdev.c:3725
>   25 dpif_netdev_operate ()  at lib/dpif-netdev.c:3756
>  
>   26 dpif_operate () at lib/dpif.c:1367
>   27 dpif_execute () at lib/dpif.c:1321
>   28 packet_execute ()   at ofproto/ofproto-dpif.c:4760
>   29 ofproto_packet_out_finish ()at ofproto/ofproto.c:3594
>  
>   30 handle_packet_out ()at ofproto/ofproto.c:3635
>   31 handle_single_part_openflow (OFPTYPE_PACKET_OUT) at 
> ofproto/ofproto.c:8400
>   32 handle_openflow ()   at 
> ofproto/ofproto.c:8587
>   33 ofconn_run ()
>   34 connmgr_run ()
>   35 ofproto_run ()
>   36 bridge_run__ ()
>   37 bridge_run ()
>   38 main ()
> 
> Fix that by splitting the 'ofproto_packet_out_finish()' in two parts.
> First one that translates side-effects and requires holding 'ofproto_mutex'
> and the second that only pushes packet to datapath.  The second part moved
> out of 'ofproto_mutex' because 'ofproto_mutex' is not required and actually
> should not be taken in order to avoid recursive locking.
> 
> Reported-by: Anil Kumar Koli 
> Reported-at: 
> https://mail.openvswitch.org/pipermail/ovs-discuss/2019-April/048494.html
> Signed-off-by: Ilya Maximets 
> ---
> 
> Previous discussion about implementation:
> https://mail.openvswitch.org/pipermail/ovs-dev/2019-August/362082.html
> 
> I'm not able to reproduce the original issue, so I can not test if
> this patch fixes it.
> 
>  ofproto/ofproto-dpif.c | 40 --
>  ofproto/ofproto-provider.h | 12 +---
>  ofproto/ofproto.c  | 29 +++
>  3 files changed, 64 insertions(+), 17 deletions(-)


Hi Ben,

Could you, please, take a look at this patch?

Best regards, Ilya Maximets.

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH] ofproto: Fix crash on PACKET_OUT due to recursive locking after upcall.

2019-11-01 Thread Ilya Maximets
Handling of OpenFlow PACKET_OUT implies pushing the packet through
the datapath and packet processing inside the datapath could trigger
an upcall.  In case of system datapath, 'dpif_execute()' sends packet
to the kernel module and returns.  If any, upcall  will be triggered
inside the kernel and handled by a separate thread in userspace.
But in case of userspace datapath full processing of the packet happens
inside the 'dpif_execute()' in the same thread that handled PACKET_OUT.
This causes an issue if upcall will lead to modification of flow rules.
For example, it could happen while processing of 'learn' actions.
Since whole handling of PACKET_OUT is protected by 'ofproto_mutex',
OVS will assert on attempt to take it recursively while processing
'learn' actions:

   0 __GI_raise (sig=sig@entry=6)
   1 __GI_abort ()
   2 ovs_abort_valist ()
   3 ovs_abort ()
   4 ovs_mutex_lock_at (where=where@entry=0xad4199 "ofproto/ofproto.c:5391")

   5 ofproto_flow_mod_learn ()   at ofproto/ofproto.c:5391

   6 xlate_learn_action ()   at ofproto/ofproto-dpif-xlate.c:5378
<'learn' action found>
   7 do_xlate_actions () at ofproto/ofproto-dpif-xlate.c:6893
   8 xlate_recursively ()at ofproto/ofproto-dpif-xlate.c:4233
   9 xlate_table_action ()   at ofproto/ofproto-dpif-xlate.c:4361
  10 in xlate_ofpact_resubmit () at ofproto/ofproto-dpif-xlate.c:4672
  11 do_xlate_actions () at ofproto/ofproto-dpif-xlate.c:6773
  12 xlate_actions ()at ofproto/ofproto-dpif-xlate.c:7570
 
  13 upcall_xlate () at ofproto/ofproto-dpif-upcall.c:1197
  14 process_upcall ()   at ofproto/ofproto-dpif-upcall.c:1413
  15 upcall_cb ()at ofproto/ofproto-dpif-upcall.c:1315
  16 dp_netdev_upcall (DPIF_UC_MISS) at lib/dpif-netdev.c:6236
 
  17 handle_packet_upcall () at lib/dpif-netdev.c:6591
  18 fast_path_processing () at lib/dpif-netdev.c:6709
  19 dp_netdev_input__ ()at lib/dpif-netdev.c:6797
  20 dp_netdev_recirculate ()at lib/dpif-netdev.c:6842
  21 dp_execute_cb ()at lib/dpif-netdev.c:7158
  22 odp_execute_actions ()  at lib/odp-execute.c:794
  23 dp_netdev_execute_actions ()at lib/dpif-netdev.c:7332
  24 dpif_netdev_execute ()  at lib/dpif-netdev.c:3725
  25 dpif_netdev_operate ()  at lib/dpif-netdev.c:3756
 
  26 dpif_operate () at lib/dpif.c:1367
  27 dpif_execute () at lib/dpif.c:1321
  28 packet_execute ()   at ofproto/ofproto-dpif.c:4760
  29 ofproto_packet_out_finish ()at ofproto/ofproto.c:3594
 
  30 handle_packet_out ()at ofproto/ofproto.c:3635
  31 handle_single_part_openflow (OFPTYPE_PACKET_OUT) at ofproto/ofproto.c:8400
  32 handle_openflow ()   at ofproto/ofproto.c:8587
  33 ofconn_run ()
  34 connmgr_run ()
  35 ofproto_run ()
  36 bridge_run__ ()
  37 bridge_run ()
  38 main ()

Fix that by splitting the 'ofproto_packet_out_finish()' in two parts.
First one that translates side-effects and requires holding 'ofproto_mutex'
and the second that only pushes packet to datapath.  The second part moved
out of 'ofproto_mutex' because 'ofproto_mutex' is not required and actually
should not be taken in order to avoid recursive locking.

Reported-by: Anil Kumar Koli 
Reported-at: 
https://mail.openvswitch.org/pipermail/ovs-discuss/2019-April/048494.html
Signed-off-by: Ilya Maximets 
---

Previous discussion about implementation:
https://mail.openvswitch.org/pipermail/ovs-dev/2019-August/362082.html

I'm not able to reproduce the original issue, so I can not test if
this patch fixes it.

 ofproto/ofproto-dpif.c | 40 --
 ofproto/ofproto-provider.h | 12 +---
 ofproto/ofproto.c  | 29 +++
 3 files changed, 64 insertions(+), 17 deletions(-)

diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c
index c35ec3e61..0466952e2 100644
--- a/ofproto/ofproto-dpif.c
+++ b/ofproto/ofproto-dpif.c
@@ -4827,12 +4827,13 @@ ofproto_dpif_xcache_execute(struct ofproto_dpif 
*ofproto,
 }
 
 static void
-packet_execute(struct ofproto *ofproto_, struct ofproto_packet_out *opo)
+packet_execute_prepare(struct ofproto *ofproto_,
+   struct ofproto_packet_out *opo)
 OVS_REQUIRES(ofproto_mutex)
 {
 struct ofproto_dpif *ofproto = ofproto_dpif_cast(ofproto_);
 struct dpif_flow_stats stats;
-struct dpif_execute execute;
+struct dpif_execute *execute;
 
 struct ofproto_dpif_packet_out *aux = opo->aux;
 ovs_assert(aux);
@@ -4841,22 +4842,40 @@ packet_execute(struct ofproto *ofproto_, struct 
ofproto_packet_out *opo)
 dpif_flow_stats_extract(opo->flow, opo->packet, time_msec(), &stats);
 ofproto_dpif_xcache_execute(ofproto, &aux->xcache, &sta