On 7 Apr 2024, at 3:57, LIU Yulong wrote:
> Hi Eelco, > > Sorry for the late reply. And thank you very much for your code examination. > > Today, I removed that "ovsrcu_quiesce();" at line 3066 [1], and > started to run the > test in my local environment. Let's see if such change can overcome the issue. Lets see… > Regarding the long time lock or quiesce state, Ilya's feedback seemed > to be true during some > service restart or large amount flow installations (no matter with or > without the lock move fix). This looks worrisome anyway and might be worth investigating (without your patch). > I noticed logs like this: > > 2024-04-07T01:33:53.025Z|00001|ovs_rcu(urcu2)|WARN|blocked 1001 ms > waiting for main to quiesce > 2024-04-07T01:33:54.025Z|00002|ovs_rcu(urcu2)|WARN|blocked 2000 ms > waiting for main to quiesce > 2024-04-07T01:33:56.025Z|00003|ovs_rcu(urcu2)|WARN|blocked 4000 ms > waiting for main to quiesce > 2024-04-07T01:34:00.025Z|00004|ovs_rcu(urcu2)|WARN|blocked 8000 ms > waiting for main to quiesce > 2024-04-07T01:34:08.025Z|00005|ovs_rcu(urcu2)|WARN|blocked 16001 ms > waiting for main to quiesce > > And > > 2024-04-03T03:38:20.817Z|00136|ovs_rcu(urcu2)|WARN|blocked 2640 ms > waiting for dpdk_watchdog1 to quiesce > 2024-04-03T03:38:20.817Z|00137|ovs_rcu(urcu2)|WARN|blocked 2640 ms > waiting for dpdk_watchdog1 to quiesce > 2024-04-03T03:38:53.624Z|00138|ovs_rcu(urcu2)|WARN|blocked 1066 ms > waiting for revalidator6 to quiesce > 2024-04-03T03:38:57.606Z|00139|ovs_rcu(urcu2)|WARN|blocked 5048 ms > waiting for revalidator6 to quiesce > 2024-04-03T03:38:57.606Z|00140|ovs_rcu(urcu2)|WARN|blocked 5048 ms > waiting for revalidator6 to quiesce > 2024-04-03T03:39:47.105Z|00141|ovs_rcu(urcu2)|WARN|blocked 54547 ms > waiting for revalidator6 to quiesce > 2024-04-03T03:39:47.105Z|00142|ovs_rcu(urcu2)|WARN|blocked 54548 ms > waiting for revalidator6 to quiesce > 2024-04-03T03:39:47.105Z|00143|ovs_rcu(urcu2)|WARN|blocked 54548 ms > waiting for revalidator6 to quiesce > 2024-04-03T03:42:30.702Z|00144|ovs_rcu(urcu2)|WARN|blocked 10937 ms > waiting for pmd-c03/id:8 to quiesce > 2024-04-03T03:42:30.703Z|00145|ovs_rcu(urcu2)|WARN|blocked 10939 ms > waiting for pmd-c03/id:8 to quiesce > 2024-04-03T03:42:30.704Z|00146|ovs_rcu(urcu2)|WARN|blocked 10939 ms > waiting for pmd-c03/id:8 to quiesce > 2024-04-03T03:42:30.704Z|00147|ovs_rcu(urcu2)|WARN|blocked 10939 ms > waiting for pmd-c03/id:8 to quiesce > 2024-04-03T03:42:39.083Z|00151|ovs_rcu(urcu2)|WARN|blocked 19318 ms > waiting for pmd-c03/id:8 to quiesce > 2024-04-03T03:42:52.928Z|00155|ovs_rcu(urcu2)|WARN|blocked 33164 ms > waiting for pmd-c03/id:8 to quiesce > 2024-04-03T03:45:13.305Z|00156|ovs_rcu(urcu2)|WARN|blocked 173540 ms > waiting for pmd-c03/id:8 to quiesce > 2024-04-03T03:45:13.305Z|00157|ovs_rcu(urcu2)|WARN|blocked 173540 ms > waiting for pmd-c03/id:8 to quiesce > 2024-04-03T03:47:58.186Z|00158|ovs_rcu(urcu2)|WARN|blocked 338421 ms > waiting for pmd-c03/id:8 to quiesce > > > Not sure if this is related to the core issue. I’ll guess it’s the opposite. If we do not go through the quiesce state the entries will not be deleted :) > > [1] > https://github.com/openvswitch/ovs/blob/master/ofproto/ofproto-dpif-upcall.c#L3066 > > > LIU Yulong > > > On Fri, Mar 22, 2024 at 8:56 PM Eelco Chaudron <echau...@redhat.com> wrote: >> >> >> >> On 15 Mar 2024, at 11:04, LIU Yulong wrote: >> >>> A potential race condition happened with the following 3 threads: >>> * PMD thread replaced the old_ukey and transitioned the state to >>> UKEY_DELETED. >>> * RCU thread is freeing the old_ukey mutex. >>> * While the revalidator thread is trying to lock the old_ukey mutex. >>> >>> We added some timestamp to udpif_key state transition and >>> the udpif_key mutex free action, as well as the sweep try lock for >>> the same udpif_key. When the ovs-vswitchd goes core, the udpif_key >>> try_lock mutex had always a bit later than the udpif_key mutex free >>> action. For instance [3]: >>> ukey_destroy_time = 13217289156490 >>> sweep_now = 13217289156568 >>> >>> The second time is 78us behind the first time. >>> >>> Then vswitchd process aborts at the revalidator thread try_lock of >>> ukey->mutex because of the NULL pointer. >>> >>> This patch adds the try_lock for the ukeys' basket udpif_key map to >>> avoid the PMD and revalidator access to the same map for replacing the >>> udpif_key and transitioning the udpif_key state. >>> >>> More details can be found at: >>> [1] >>> https://mail.openvswitch.org/pipermail/ovs-discuss/2024-March/052973.html >>> [2] >>> https://mail.openvswitch.org/pipermail/ovs-discuss/2024-February/052949.html >>> [3] >>> https://mail.openvswitch.org/pipermail/ovs-discuss/2024-March/052993.html >>> >>> Signed-off-by: LIU Yulong <i...@liuyulong.me> >> >> Hi LIU, >> >> I've examined a lot of code, but I can't seem to figure out what event could >> be causing your problem. I also don't understand why your fix would prevent >> the problem from occurring (aside from possibly avoiding some quiescent >> state). Here's some of my analysis. >> >> Assume PMD is doing try_ukey_replace(): >> >> 1930 try_ukey_replace(struct umap *umap, struct udpif_key *old_ukey, >> 1931 struct udpif_key *new_ukey) >> 1932 OVS_REQUIRES(umap->mutex) >> 1933 OVS_TRY_LOCK(true, new_ukey->mutex) >> 1934 { >> 1935 bool replaced = false; >> 1936 >> 1937 if (!ovs_mutex_trylock(&old_ukey->mutex)) { >> 1938 if (old_ukey->state == UKEY_EVICTED) { >> 1939 /* The flow was deleted during the current revalidator >> dump, >> 1940 * but its ukey won't be fully cleaned up until the >> sweep phase. >> 1941 * In the mean time, we are receiving upcalls for this >> traffic. >> 1942 * Expedite the (new) flow install by replacing the >> ukey. */ >> 1943 ovs_mutex_lock(&new_ukey->mutex); >> 1944 cmap_replace(&umap->cmap, &old_ukey->cmap_node, >> 1945 &new_ukey->cmap_node, new_ukey->hash); >> 1946 new_ukey->dump_seq = old_ukey->dump_seq; >> 1947 ovsrcu_postpone(ukey_delete__, old_ukey); >> 1948 transition_ukey(old_ukey, UKEY_DELETED); >> 1949 transition_ukey(new_ukey, UKEY_VISIBLE); >> 1950 replaced = true; >> 1951 } >> 1952 ovs_mutex_unlock(&old_ukey->mutex); >> 1953 } >> ... >> >> In this code analysis, it's evident that the ukey deletion doesn't occur >> immediately but rather after completing an RCU quiescent period. >> The function revalidator_sweep__() doesn't include any ovsrcu_quiesce() >> calls within the CMAP loop; instead, it executes it when processing the next >> slice. See the code snippet below. >> >> 2983 static void >> 2984 revalidator_sweep__(struct revalidator *revalidator, bool purge) >> 2985 { >> .... >> 2995 >> 2996 for (int i = slice; i < N_UMAPS; i += udpif->n_revalidators) { >> .... >> 3004 >> 3005 CMAP_FOR_EACH(ukey, cmap_node, &umap->cmap) { >> 3006 enum flow_del_reason del_reason = FDR_NONE; >> 3007 enum ukey_state ukey_state; >> 3008 >> 3009 /* Handler threads could be holding a ukey lock while it >> installs a >> 3010 * new flow, so don't hang around waiting for access to >> it. */ >> 3011 if (ovs_mutex_trylock(&ukey->mutex)) { >> 3012 continue; >> 3013 } >> 3037 reval_op_init(&ops[n_ops++], result, udpif, ukey, >> &recircs, >> 3038 &odp_actions); >> 3039 } >> >> 3040 OVS_USDT_PROBE(revalidator_sweep__, flow_sweep_result, >> udpif, >> 3041 ukey, result, del_reason); >> 3042 } >> 3043 ovs_mutex_unlock(&ukey->mutex); >> .... >> 3045 if (ukey_state == UKEY_EVICTED) { >> 3046 /* The common flow deletion case involves deletion of >> the flow >> 3047 * during the dump phase and ukey deletion here. */ >> 3048 ovs_mutex_lock(&umap->mutex); >> 3049 ukey_delete(umap, ukey); >> 3050 ovs_mutex_unlock(&umap->mutex); >> 3051 } >> 3052 >> 3053 if (n_ops == REVALIDATE_MAX_BATCH) { >> 3054 /* Update/delete missed flows and clean up >> corresponding ukeys >> 3055 * if necessary. */ >> 3056 push_ukey_ops(udpif, umap, ops, n_ops); >> 3057 n_ops = 0; >> 3058 } >> 3059 } >> .... >> 3066 ovsrcu_quiesce(); >> ^^ Here we call it. >> 3067 } >> 3068 } >> >> So maybe you could try to remove the ovsrcu_quiesce() above, as this seems >> to be the only hint why we could go trough quiescent state (maybe some >> compiler optimization)? >> >> If this is not solving the problem, we might need to come up with some more >> debugging. Let me know and I can try to put some debug code together. >> >> //Eelco >> >> _______________________________________________ >> dev mailing list >> d...@openvswitch.org >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev