On 7 Apr 2024, at 3:57, LIU Yulong wrote:

> Hi Eelco,
>
> Sorry for the late reply. And thank you very much for your code examination.
>
> Today, I removed that  "ovsrcu_quiesce();" at  line 3066 [1], and
> started to run the
> test in my local environment. Let's see if such change can overcome the issue.

Lets see…

> Regarding the long time lock or quiesce state, Ilya's feedback seemed
> to be true during some
> service restart or large amount flow installations (no matter with or
> without the lock move fix).

This looks worrisome anyway and might be worth investigating (without your 
patch).

> I noticed logs like this:
>
> 2024-04-07T01:33:53.025Z|00001|ovs_rcu(urcu2)|WARN|blocked 1001 ms
> waiting for main to quiesce
> 2024-04-07T01:33:54.025Z|00002|ovs_rcu(urcu2)|WARN|blocked 2000 ms
> waiting for main to quiesce
> 2024-04-07T01:33:56.025Z|00003|ovs_rcu(urcu2)|WARN|blocked 4000 ms
> waiting for main to quiesce
> 2024-04-07T01:34:00.025Z|00004|ovs_rcu(urcu2)|WARN|blocked 8000 ms
> waiting for main to quiesce
> 2024-04-07T01:34:08.025Z|00005|ovs_rcu(urcu2)|WARN|blocked 16001 ms
> waiting for main to quiesce
>
> And
>
> 2024-04-03T03:38:20.817Z|00136|ovs_rcu(urcu2)|WARN|blocked 2640 ms
> waiting for dpdk_watchdog1 to quiesce
> 2024-04-03T03:38:20.817Z|00137|ovs_rcu(urcu2)|WARN|blocked 2640 ms
> waiting for dpdk_watchdog1 to quiesce
> 2024-04-03T03:38:53.624Z|00138|ovs_rcu(urcu2)|WARN|blocked 1066 ms
> waiting for revalidator6 to quiesce
> 2024-04-03T03:38:57.606Z|00139|ovs_rcu(urcu2)|WARN|blocked 5048 ms
> waiting for revalidator6 to quiesce
> 2024-04-03T03:38:57.606Z|00140|ovs_rcu(urcu2)|WARN|blocked 5048 ms
> waiting for revalidator6 to quiesce
> 2024-04-03T03:39:47.105Z|00141|ovs_rcu(urcu2)|WARN|blocked 54547 ms
> waiting for revalidator6 to quiesce
> 2024-04-03T03:39:47.105Z|00142|ovs_rcu(urcu2)|WARN|blocked 54548 ms
> waiting for revalidator6 to quiesce
> 2024-04-03T03:39:47.105Z|00143|ovs_rcu(urcu2)|WARN|blocked 54548 ms
> waiting for revalidator6 to quiesce
> 2024-04-03T03:42:30.702Z|00144|ovs_rcu(urcu2)|WARN|blocked 10937 ms
> waiting for pmd-c03/id:8 to quiesce
> 2024-04-03T03:42:30.703Z|00145|ovs_rcu(urcu2)|WARN|blocked 10939 ms
> waiting for pmd-c03/id:8 to quiesce
> 2024-04-03T03:42:30.704Z|00146|ovs_rcu(urcu2)|WARN|blocked 10939 ms
> waiting for pmd-c03/id:8 to quiesce
> 2024-04-03T03:42:30.704Z|00147|ovs_rcu(urcu2)|WARN|blocked 10939 ms
> waiting for pmd-c03/id:8 to quiesce
> 2024-04-03T03:42:39.083Z|00151|ovs_rcu(urcu2)|WARN|blocked 19318 ms
> waiting for pmd-c03/id:8 to quiesce
> 2024-04-03T03:42:52.928Z|00155|ovs_rcu(urcu2)|WARN|blocked 33164 ms
> waiting for pmd-c03/id:8 to quiesce
> 2024-04-03T03:45:13.305Z|00156|ovs_rcu(urcu2)|WARN|blocked 173540 ms
> waiting for pmd-c03/id:8 to quiesce
> 2024-04-03T03:45:13.305Z|00157|ovs_rcu(urcu2)|WARN|blocked 173540 ms
> waiting for pmd-c03/id:8 to quiesce
> 2024-04-03T03:47:58.186Z|00158|ovs_rcu(urcu2)|WARN|blocked 338421 ms
> waiting for pmd-c03/id:8 to quiesce
>
>
> Not sure if this is related to the core issue.

I’ll guess it’s the opposite. If we do not go through the quiesce state the 
entries will not be deleted :)

>
> [1] 
> https://github.com/openvswitch/ovs/blob/master/ofproto/ofproto-dpif-upcall.c#L3066
>
>
> LIU Yulong
>
>
> On Fri, Mar 22, 2024 at 8:56 PM Eelco Chaudron <echau...@redhat.com> wrote:
>>
>>
>>
>> On 15 Mar 2024, at 11:04, LIU Yulong wrote:
>>
>>> A potential race condition happened with the following 3 threads:
>>> * PMD thread replaced the old_ukey and transitioned the state to
>>>   UKEY_DELETED.
>>> * RCU thread is freeing the old_ukey mutex.
>>> * While the revalidator thread is trying to lock the old_ukey mutex.
>>>
>>> We added some timestamp to udpif_key state transition and
>>> the udpif_key mutex free action, as well as the sweep try lock for
>>> the same udpif_key. When the ovs-vswitchd goes core, the udpif_key
>>> try_lock mutex had always a bit later than the udpif_key mutex free
>>> action. For instance [3]:
>>> ukey_destroy_time = 13217289156490
>>> sweep_now = 13217289156568
>>>
>>> The second time is 78us behind the first time.
>>>
>>> Then vswitchd process aborts at the revalidator thread try_lock of
>>> ukey->mutex because of the NULL pointer.
>>>
>>> This patch adds the try_lock for the ukeys' basket udpif_key map to
>>> avoid the PMD and revalidator access to the same map for replacing the
>>> udpif_key and transitioning the udpif_key state.
>>>
>>> More details can be found at:
>>> [1] 
>>> https://mail.openvswitch.org/pipermail/ovs-discuss/2024-March/052973.html
>>> [2] 
>>> https://mail.openvswitch.org/pipermail/ovs-discuss/2024-February/052949.html
>>> [3] 
>>> https://mail.openvswitch.org/pipermail/ovs-discuss/2024-March/052993.html
>>>
>>> Signed-off-by: LIU Yulong <i...@liuyulong.me>
>>
>> Hi LIU,
>>
>> I've examined a lot of code, but I can't seem to figure out what event could 
>> be causing your problem. I also don't understand why your fix would prevent 
>> the problem from occurring (aside from possibly avoiding some quiescent 
>> state). Here's some of my analysis.
>>
>> Assume PMD is doing try_ukey_replace():
>>
>>   1930  try_ukey_replace(struct umap *umap, struct udpif_key *old_ukey,
>>   1931                   struct udpif_key *new_ukey)
>>   1932      OVS_REQUIRES(umap->mutex)
>>   1933      OVS_TRY_LOCK(true, new_ukey->mutex)
>>   1934  {
>>   1935      bool replaced = false;
>>   1936
>>   1937      if (!ovs_mutex_trylock(&old_ukey->mutex)) {
>>   1938          if (old_ukey->state == UKEY_EVICTED) {
>>   1939              /* The flow was deleted during the current revalidator 
>> dump,
>>   1940               * but its ukey won't be fully cleaned up until the 
>> sweep phase.
>>   1941               * In the mean time, we are receiving upcalls for this 
>> traffic.
>>   1942               * Expedite the (new) flow install by replacing the 
>> ukey. */
>>   1943              ovs_mutex_lock(&new_ukey->mutex);
>>   1944              cmap_replace(&umap->cmap, &old_ukey->cmap_node,
>>   1945                           &new_ukey->cmap_node, new_ukey->hash);
>>   1946              new_ukey->dump_seq = old_ukey->dump_seq;
>>   1947              ovsrcu_postpone(ukey_delete__, old_ukey);
>>   1948              transition_ukey(old_ukey, UKEY_DELETED);
>>   1949              transition_ukey(new_ukey, UKEY_VISIBLE);
>>   1950              replaced = true;
>>   1951          }
>>   1952          ovs_mutex_unlock(&old_ukey->mutex);
>>   1953      }
>>   ...
>>
>> In this code analysis, it's evident that the ukey deletion doesn't occur 
>> immediately but rather after completing an RCU quiescent period.
>> The function revalidator_sweep__() doesn't include any ovsrcu_quiesce() 
>> calls within the CMAP loop; instead, it executes it when processing the next 
>> slice. See the code snippet below.
>>
>> 2983  static void
>> 2984  revalidator_sweep__(struct revalidator *revalidator, bool purge)
>> 2985  {
>> ....
>> 2995
>> 2996      for (int i = slice; i < N_UMAPS; i += udpif->n_revalidators) {
>> ....
>> 3004
>> 3005          CMAP_FOR_EACH(ukey, cmap_node, &umap->cmap) {
>> 3006              enum flow_del_reason del_reason = FDR_NONE;
>> 3007              enum ukey_state ukey_state;
>> 3008
>> 3009              /* Handler threads could be holding a ukey lock while it 
>> installs a
>> 3010               * new flow, so don't hang around waiting for access to 
>> it. */
>> 3011              if (ovs_mutex_trylock(&ukey->mutex)) {
>> 3012                  continue;
>> 3013              }
>> 3037                      reval_op_init(&ops[n_ops++], result, udpif, ukey, 
>> &recircs,
>> 3038                                    &odp_actions);
>> 3039                  }
>>
>> 3040                  OVS_USDT_PROBE(revalidator_sweep__, flow_sweep_result, 
>> udpif,
>> 3041                                 ukey, result, del_reason);
>> 3042              }
>> 3043              ovs_mutex_unlock(&ukey->mutex);
>> ....
>> 3045              if (ukey_state == UKEY_EVICTED) {
>> 3046                  /* The common flow deletion case involves deletion of 
>> the flow
>> 3047                   * during the dump phase and ukey deletion here. */
>> 3048                  ovs_mutex_lock(&umap->mutex);
>> 3049                  ukey_delete(umap, ukey);
>> 3050                  ovs_mutex_unlock(&umap->mutex);
>> 3051              }
>> 3052
>> 3053              if (n_ops == REVALIDATE_MAX_BATCH) {
>> 3054                  /* Update/delete missed flows and clean up 
>> corresponding ukeys
>> 3055                   * if necessary. */
>> 3056                  push_ukey_ops(udpif, umap, ops, n_ops);
>> 3057                  n_ops = 0;
>> 3058              }
>> 3059          }
>> ....
>> 3066          ovsrcu_quiesce();
>>               ^^ Here we call it.
>> 3067      }
>> 3068  }
>>
>> So maybe you could try to remove the ovsrcu_quiesce() above, as this seems 
>> to be the only hint why we could go trough quiescent state (maybe some 
>> compiler optimization)?
>>
>> If this is not solving the problem, we might need to come up with some more 
>> debugging. Let me know and I can try to put some debug code together.
>>
>> //Eelco
>>
>> _______________________________________________
>> dev mailing list
>> d...@openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to