On 16 Dec 2022, at 8:56, Peng He wrote:

> From: Peng He <[email protected]>
> To: Eelco Chaudron <[email protected]>
> Cc: Ilya Maximets <[email protected]>, [email protected]
> Subject: Re: [ovs-dev v7 1/3] ofproto-dpif-upcall: fix push_dp_ops
> Date: Fri, 16 Dec 2022 15:56:32 +0800
>
> Eelco Chaudron <[email protected]> 于2022年12月13日周二 20:36写道:
>
>>
>>
>> On 10 Dec 2022, at 1:37, Peng He wrote:
>>
>>> Patch v5 has statistics issues.
>>>
>>> In order to solve this issue, we had a discussion.
>>>
>>> below is the quote of the email.
>>>
>>> ”
>>> After a second thought, I think maybe keeping INCONSISTENT just for the
>>> modify error is a better option.
>>>
>>> With current patch:
>>> 1.
>>> the modify error case:
>>> OPERATIONAL -> INCONSISTENT ->  EVICTING -> EVICTED
>>> 2.
>>> the delete error case:
>>> EVICTING -> EVICTED
>>>
>>> Change both to INCONSISTENT:
>>>
>>> the modify error case:
>>> did not change.
>>>
>>> the delete error case:
>>> EVICTING -> INCONSISTENT -> EVICTED?
>>>
>>> “
>>>
>>> And we agree to take the second solution.
>>
>> I know, but going over the state meanings again, UKEY_EVICTING means the
>> following:
>>
>>  /* Ukey is in umap, datapath flow delete is queued. */
>>
>> Which now no longer is the case, so should a new state not make more sense?
>>
>
> Why it's no longer valid?
>
> In the patch, only modify failed ukey will be set to EVICTING, is it just
> right fit the meaning of
> EVICTING? (ukey in the umap, but delete operation is queued?)

But it’s not as the delete operation is not queued, that is done in the 
revalidator_sweep__() part.

>>
>> Any one else has some input on this??
>>
>>> Eelco Chaudron <[email protected]> 于2022年12月8日周四 18:54写道:
>>>
>>>>
>>>>
>>>> On 27 Nov 2022, at 8:28, Peng He wrote:
>>>>
>>>>> push_dp_ops only handles delete ops errors but ignores the modify
>>>>> ops results. It's better to handle all the dp operation errors in
>>>>> a consistent way.
>>>>>
>>>>> We observe in the production environment that sometimes a megaflow
>>>>> with wrong actions keep staying in datapath. The coverage command shows
>>>>> revalidators have dumped several times, however the correct
>>>>> actions are not set. This implies that the ukey's action does not
>>>>> equal to the meagaflow's, i.e. revalidators think the underlying
>>>>> megaflow's actions are correct however they are not.
>>>>>
>>>>> We also check the megaflow using the ofproto/trace command, and the
>>>>> actions are not matched with the ones in the actual magaflow. By
>>>>> performing a revalidator/purge command, the right actions are set.
>>>>>
>>>>> This patch prevents the inconsistency by considering modify failure
>>>>> in revalidators.
>>>>>
>>>>> To note, we cannot perform two state transitions and change ukey_state
>>>>> into UKEY_EVICTED directly here, because, if we do so, the
>>>>> sweep will remove the ukey alone and leave dp flow alive. Later, the
>>>>> dump will retrieve the dp flow and might even recover it. This will
>>>>> contribute the stats of this dp flow twice.
>>>>>
>>>>> Signed-off-by: Peng He <[email protected]>
>>>>> ---
>>>>>  ofproto/ofproto-dpif-upcall.c | 34 +++++++++++++++++++++++-----------
>>>>>  1 file changed, 23 insertions(+), 11 deletions(-)
>>>>>
>>>>> diff --git a/ofproto/ofproto-dpif-upcall.c
>>>> b/ofproto/ofproto-dpif-upcall.c
>>>>> index 7ad728adf..c2cefbeb8 100644
>>>>> --- a/ofproto/ofproto-dpif-upcall.c
>>>>> +++ b/ofproto/ofproto-dpif-upcall.c
>>>>> @@ -2416,26 +2416,30 @@ push_dp_ops(struct udpif *udpif, struct ukey_op
>>>> *ops, size_t n_ops)
>>>>>
>>>>>      for (i = 0; i < n_ops; i++) {
>>>>>          struct ukey_op *op = &ops[i];
>>>>> -        struct dpif_flow_stats *push, *stats, push_buf;
>>>>> -
>>>>> -        stats = op->dop.flow_del.stats;
>>>>> -        push = &push_buf;
>>>>> -
>>>>> -        if (op->dop.type != DPIF_OP_FLOW_DEL) {
>>>>> -            /* Only deleted flows need their stats pushed. */
>>>>> -            continue;
>>>>> -        }
>>>>>
>>>>>          if (op->dop.error) {
>>>>> -            /* flow_del error, 'stats' is unusable. */
>>>>>              if (op->ukey) {
>>>>>                  ovs_mutex_lock(&op->ukey->mutex);
>>>>> -                transition_ukey(op->ukey, UKEY_EVICTED);
>>>>> +                if (op->dop.type == DPIF_OP_FLOW_DEL) {
>>>>> +                    transition_ukey(op->ukey, UKEY_EVICTED);
>>>>> +                } else {
>>
>> I think we could use a comment here to make sure why we set it to
>> evicting. Maybe just a reference to the comment in revalidator_sweep__()
>> might be enough.
>>
>>>>> +                    transition_ukey(op->ukey, UKEY_EVICTING);
>>>>> +                }
>>>>>                  ovs_mutex_unlock(&op->ukey->mutex);
>>>>>              }
>>>>>              continue;
>>>>>          }
>>>>>
>>>>> +        if (op->dop.type != DPIF_OP_FLOW_DEL) {
>>>>> +            /* Only deleted flows need their stats pushed. */
>>>>> +            continue;
>>>>> +        }
>>>>> +
>>>>> +        struct dpif_flow_stats *push, *stats, push_buf;
>>>>> +
>>>>> +        stats = op->dop.flow_del.stats;
>>>>> +        push = &push_buf;
>>>>> +
>>>>>          if (op->ukey) {
>>>>>              ovs_mutex_lock(&op->ukey->mutex);
>>>>>              transition_ukey(op->ukey, UKEY_EVICTED);
>>>>> @@ -2848,6 +2852,14 @@ revalidator_sweep__(struct revalidator
>>>> *revalidator, bool purge)
>>>>>                  continue;
>>>>>              }
>>>>>              ukey_state = ukey->state;
>>>>> +
>>>>> +            if (ukey_state == UKEY_EVICTING) {
>>>>> +                /* previous modify operation fails on this ukey and
>>>> ukey_state
>>>>> +                 * is set to UKEY_EVICTING, issue a delete operation
>> on
>>>> this
>>>>> +                 * ukey.
>>>>> +                 */
>>>>> +                delete_op_init(udpif, &ops[n_ops++], ukey);
>>
>> This would cause the later push_ukey_ops() to call ukey_delete() which
>> will call transition_ukey(ukey, UKEY_DELETED).
>> Which I think will result in an invalid state transaction from
>> UKEY_EVICTING to UKEY_DELETED.
>>
>
> No, push_ukey_ops calls  push_dp_ops, and push_dp_ops will delete the
> datapath key first and then set the state to UKEY_EVICTED.

ACK, you are right. Missed the transition in the successful case as we have a 
DEL operation.

>>>>
>>>> How can we be sure this state here is only for failing to modify
>>>> operations?
>>>> This state is set in other places in the code also.
>>>>
>>>
>>> because revalidate_sweep is called after revaldiate. all the datapath ops
>>> have already been
>>> pushed, and if there are ukeys at UKEY_EVICTING, it's because previously
>>> datapath ops
>>> have errors. and if del ops fail, ukey is deleted. so only ukeys at
>>> UKEY_EVICTING must
>>> come from modify fail.
>>
>> First of all, I missed that p_purge_cb() was pausing the revalidators, so
>> you are right that there is no other way to set UKEY_EVICTING other than
>> the sweet phase.
>>
>> I also missed the part that the ukey_delete() will do the actual removal,
>> and not the if (ukey_state == UKEY_EVICTED) statement lower.
>>
>> Maybe we should reconsider having a special state? Anyhow, I do think the
>> comment in the code needs some changing to make this clear:
>>
>
> Yes the comments are clearer, and I can add them in the next version.
>
>
>
>>
>>
>>             if (ukey_state == UKEY_EVICTING) {
>>                 /* A previous modify operation in push_dp_ops() failed and
>>                  * caused the state to be UKEY_EVICTING. We need to
>> manually
>>                  * push a delete operation on this ukey to get both the
>>                  * datapath flow and ukey removed. The later
>> push_ukey_ops()
>>                  * will call  push_dp_ops() and ukey_delete() to accomplish
>>                  * this. */
>>                 delete_op_init(udpif, &ops[n_ops++], ukey);
>>             }
>>
>>>> Maybe we should keep the INCONSISTENT state to avoid this (see v5)?
>>>>
>>
>> What about the case the push_dp_ops()/push_ukey_ops() fails in
>> revalidator_sweep__() for a none DELETE action, who will take care of this?
>> The alternative fix might be better in this case.
>>
>
> DELETE failure in the original code is taken cared. It will go to EVICTED
> directly.
> This leaves a possibility that a key is in the dp but not in the ukey,
> which is fine, because we have the mecahnism already:
>
> check ukey_create_from_dpif_flow and ukey_acquire fail path.

I meant to say if we do a modify in revalidator_sweep__()

If we take the following branch,

 if (ukey_state == UKEY_OPERATIONAL
                || (ukey_state == UKEY_VISIBLE && purge)) {

  and then  revalidate_ukey() gets called and it returns UKEY_MODIFY which 
calls push_dp_ops() and now assume this modify fails…

This will leaf the ukey in UKEY_EVICTING state, as there is no additional 
revalidator_sweep__() call.

This will lead to the below in the next revalidator() call:

VLOG_INFO("Unexpected ukey transition from state %d "
                          "(last transitioned from thread %u at %s)",

Or am I missing something?

Cheers,

Eelco

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to