[ovs-discuss] problem about cls pvector may cause ovs-vswitchd crash

2020-09-16 Thread Yinpeijun
Hi ALL,

This problem is fixed, update info as follows:


1)   The following patch can fix the problem  happened when using the 
testing code of the previous mail.

https://www.mail-archive.com/ovs-dev@openvswitch.org/msg41202.html


2)   What we need here is a  memory_order_acquire sematics when getting 
impl->size, to make sure, the

current cpu to get impl->vector content correctly.


3)   The key point here is release/consume,  release/require must be used 
in paired. If ptr is inserted

into pvec->temp and published later with ovsrcu_set, ovsrcu_get could get the 
correct pvector->impl->vector

content (both ptr and *ptr). But sometimes pvector_insert is worked with impl 
unchanged,
and only release impl->size, so impl->size need to be get use acquire sematics 
to get correct impl->vector
content. pvector->size and pvector->vector has no data-dependent relationship, 
so acquire must be used here, not consume.

>Hi All,
>
>Recently we found a problem, as follow:
>
>1. Problem description:
>PVECTOR_FOR_EACH use ovsrcu_get to get pvector’s current impl pointer, and the 
>memory_order_consume
>in ovsrcu_get will ensure *impl is read after this instruction, so we can get 
>the the correct ptr value in
>impl->vector[0], but it seems that we cannot make sure that the *ptr value is 
>also correct.
>
>2. Verification through testing:
>Copy the test code into file lib/dpif-netdev.c, and the modify fuction 
>pmd_thread_main, insert the following line
>in the for (;;) loop:
>do_atomic_test(pmd);
>  if the do_atomic_consumer is implement without mutex lock, we can easily get 
> the following log in ovs-vswitchd.log:
>  my_itrator:143, my_value:144  (we may get other values, and 
> my_itrator = my_value - 1 stay true.
>
>   The consumer thread can get dirty memory with memory_order_consume, if we 
> change ovsrcu_get to get values
>   with memory_order_acquire, we can still get the error message (my_itrator = 
> my_value - 1).
>
>   We can fix this problem if consumer thread access data with g_pvector_lock’ 
> protection, but pvector is designed
>to use without locks. Where did it go wrong? Can anyone here give some 
>comments?
>
>
>- A error case truly happened 
>whining using pvector --
>Suppose a scenario as follows:
>1)   handler thread insert a dpcls_subtable, whose address is p,
>to datapath classifier’s (struct dpcls) pvector;
>2)   pmd thread call fast_path_processing to find the subtable and access 
>address p;
>3)   the subtable above is destroyed by someone;
>4)   handler thread insert another subtable, whose address is also p,
>and insert to dpcls’s pvector;
>5)   pmd thread call fast_path_processing -> dp_netdev_pmd_lookup_flow
>-> dpcls_lookup to access the subtable from pvector.
>
>We use PVECTOR_FOR_EACH to iterate the pvector in step 5, no locks is used, 
>dirty memory access may
>cause to ovs-vswitchd process coredump.
>
>--Testing codes are as follows 
>-
>static struct pvector  g_atomic_test;
>static unsigned char  g_my_data[65536];
>static unsigned char  g_value = 0;
>static intg_index = 0;
>static struct ovs_mutex g_pvector_lock = OVS_MUTEX_INITIALIZER;
>
>static void do_atomic_producer(void)
>{
>ovs_mutex_lock(_pvector_lock);
>if (g_index < 65536) {
>g_my_data[g_index] = g_value;
>pvector_insert(_atomic_test, (void *)_my_data[g_index], 0);
>g_index++;
>} else {
>for (int i = 0; i < g_index; i++) {
>pvector_remove(_atomic_test, (void *)_my_data[i]);
>}
>g_index = 0;
>g_value++; //value can loops to zero;
>}
>pvector_publish(_atomic_test);
>ovs_mutex_unlock(_pvector_lock);
>}
>
>static void do_atomic_consumer(void)
>{
>unsigned char *my_itr;
>unsigned char my_value;
>bool first = true;
>
>// ovs_mutex_lock(_pvector_lock);
>PVECTOR_FOR_EACH (my_itr, _atomic_test) {
>  if (first) {
>  my_value = *my_itr;
>  first = false;
>  } else if (*my_itr != my_value) {
>  VLOG_ERR("my_itrator:%u, my_value:%u\n", *my_itr, my_value);
>  break;
>  }
>}
>// ovs_mutex_unlock(_pvector_lock);
>}
>
>static void do_atomic_test(struct dp_netdev_pmd_thread *pmd)
>{
>   if (pmd->core_id == 1) {
>  do_atomic_producer();
>   } else {
>  do_atomic_consumer();
>   }
>}

___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


[ovs-discuss] problem about cls pvector may cause ovs-vswitchd crash

2020-09-15 Thread Yinpeijun
Hi All,

Recently we found a problem, as follow:


1.  Problem description:

PVECTOR_FOR_EACH use ovsrcu_get to get pvector’s current impl pointer, and the 
memory_order_consume
in ovsrcu_get will ensure *impl is read after this instruction, so we can get 
the the correct ptr value in
impl->vector[0], but it seems that we cannot make sure that the *ptr value is 
also correct.


2.  Verification through testing:

Copy the test code into file lib/dpif-netdev.c, and the modify fuction 
pmd_thread_main, insert the following line

in the for (;;) loop:

do_atomic_test(pmd);
  if the do_atomic_consumer is implement without mutex lock, we can easily get 
the following log in ovs-vswitchd.log:
  my_itrator:143, my_value:144  (we may get other values, and 
my_itrator = my_value - 1 stay true.

   The consumer thread can get dirty memory with memory_order_consume, if we 
change ovsrcu_get to get values
   with memory_order_acquire, we can still get the error message (my_itrator = 
my_value - 1).

   We can fix this problem if consumer thread access data with g_pvector_lock’ 
protection, but pvector is designed
to use without locks. Where did it go wrong? Can anyone here give some comments?


- A error case truly happened 
whining using pvector --
Suppose a scenario as follows:

1)   handler thread insert a dpcls_subtable, whose address is p,

to datapath classifier’s (struct dpcls) pvector;

2)   pmd thread call fast_path_processing to find the subtable and access 
address p;

3)   the subtable above is destroyed by someone;

4)   handler thread insert another subtable, whose address is also p,

and insert to dpcls’s pvector;

5)   pmd thread call fast_path_processing -> dp_netdev_pmd_lookup_flow

-> dpcls_lookup to access the subtable from pvector.

We use PVECTOR_FOR_EACH to iterate the pvector in step 5, no locks is used, 
dirty memory access may
cause to ovs-vswitchd process coredump.

--Testing codes are as follows 
-
static struct pvector  g_atomic_test;
static unsigned char  g_my_data[65536];
static unsigned char  g_value = 0;
static intg_index = 0;
static struct ovs_mutex g_pvector_lock = OVS_MUTEX_INITIALIZER;

static void do_atomic_producer(void)
{
ovs_mutex_lock(_pvector_lock);
if (g_index < 65536) {
g_my_data[g_index] = g_value;
pvector_insert(_atomic_test, (void *)_my_data[g_index], 0);
g_index++;
} else {
for (int i = 0; i < g_index; i++) {
pvector_remove(_atomic_test, (void *)_my_data[i]);
}
g_index = 0;
g_value++; //value can loops to zero;
}
pvector_publish(_atomic_test);
ovs_mutex_unlock(_pvector_lock);
}

static void do_atomic_consumer(void)
{
unsigned char *my_itr;
unsigned char my_value;
bool first = true;

// ovs_mutex_lock(_pvector_lock);
PVECTOR_FOR_EACH (my_itr, _atomic_test) {
  if (first) {
  my_value = *my_itr;
  first = false;
  } else if (*my_itr != my_value) {
  VLOG_ERR("my_itrator:%u, my_value:%u\n", *my_itr, my_value);
  break;
  }
}
// ovs_mutex_unlock(_pvector_lock);
}

static void do_atomic_test(struct dp_netdev_pmd_thread *pmd)
{
   if (pmd->core_id == 1) {
  do_atomic_producer();
   } else {
  do_atomic_consumer();
   }
}
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss