Hi ALL, This problem is fixed, update info as follows:
1) The following patch can fix the problem happened when using the testing code of the previous mail. https://www.mail-archive.com/ovs-dev@openvswitch.org/msg41202.html 2) What we need here is a memory_order_acquire sematics when getting impl->size, to make sure, the current cpu to get impl->vector content correctly. 3) The key point here is release/consume, release/require must be used in paired. If ptr is inserted into pvec->temp and published later with ovsrcu_set, ovsrcu_get could get the correct pvector->impl->vector content (both ptr and *ptr). But sometimes pvector_insert is worked with impl unchanged, and only release impl->size, so impl->size need to be get use acquire sematics to get correct impl->vector content. pvector->size and pvector->vector has no data-dependent relationship, so acquire must be used here, not consume. >Hi All, > >Recently we found a problem, as follow: > >1. Problem description: >PVECTOR_FOR_EACH use ovsrcu_get to get pvector’s current impl pointer, and the >memory_order_consume >in ovsrcu_get will ensure *impl is read after this instruction, so we can get >the the correct ptr value in >impl->vector[0], but it seems that we cannot make sure that the *ptr value is >also correct. > >2. Verification through testing: >Copy the test code into file lib/dpif-netdev.c, and the modify fuction >pmd_thread_main, insert the following line >in the for (;;) loop: >do_atomic_test(pmd); > if the do_atomic_consumer is implement without mutex lock, we can easily get > the following log in ovs-vswitchd.log: > ----my_itrator:143, my_value:144 (we may get other values, and > my_itrator = my_value - 1 stay true. > > The consumer thread can get dirty memory with memory_order_consume, if we > change ovsrcu_get to get values > with memory_order_acquire, we can still get the error message (my_itrator = > my_value - 1). > > We can fix this problem if consumer thread access data with g_pvector_lock’ > protection, but pvector is designed >to use without locks. Where did it go wrong? Can anyone here give some >comments? > > >--------------------------------------------- A error case truly happened >whining using pvector -------------------------- >Suppose a scenario as follows: >1) handler thread insert a dpcls_subtable, whose address is p, >to datapath classifier’s (struct dpcls) pvector; >2) pmd thread call fast_path_processing to find the subtable and access >address p; >3) the subtable above is destroyed by someone; >4) handler thread insert another subtable, whose address is also p, >and insert to dpcls’s pvector; >5) pmd thread call fast_path_processing -> dp_netdev_pmd_lookup_flow >-> dpcls_lookup to access the subtable from pvector. > >We use PVECTOR_FOR_EACH to iterate the pvector in step 5, no locks is used, >dirty memory access may >cause to ovs-vswitchd process coredump. > >----------------------------------------------Testing codes are as follows >--------------------------------------------- >static struct pvector g_atomic_test; >static unsigned char g_my_data[65536]; >static unsigned char g_value = 0; >static int g_index = 0; >static struct ovs_mutex g_pvector_lock = OVS_MUTEX_INITIALIZER; > >static void do_atomic_producer(void) >{ > ovs_mutex_lock(&g_pvector_lock); > if (g_index < 65536) { > g_my_data[g_index] = g_value; > pvector_insert(&g_atomic_test, (void *)&g_my_data[g_index], 0); > g_index++; > } else { > for (int i = 0; i < g_index; i++) { > pvector_remove(&g_atomic_test, (void *)&g_my_data[i]); > } > g_index = 0; > g_value++; //value can loops to zero; > } > pvector_publish(&g_atomic_test); > ovs_mutex_unlock(&g_pvector_lock); >} > >static void do_atomic_consumer(void) >{ > unsigned char *my_itr; > unsigned char my_value; > bool first = true; > > // ovs_mutex_lock(&g_pvector_lock); > PVECTOR_FOR_EACH (my_itr, &g_atomic_test) { > if (first) { > my_value = *my_itr; > first = false; > } else if (*my_itr != my_value) { > VLOG_ERR("----my_itrator:%u, my_value:%u\n", *my_itr, my_value); > break; > } > } > // ovs_mutex_unlock(&g_pvector_lock); >} > >static void do_atomic_test(struct dp_netdev_pmd_thread *pmd) >{ > if (pmd->core_id == 1) { > do_atomic_producer(); > } else { > do_atomic_consumer(); > } >}
_______________________________________________ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss