On 10 Dec 2022, at 1:37, Peng He wrote:

> Patch v5 has statistics issues.
>
> In order to solve this issue, we had a discussion.
>
> below is the quote of the email.
>
> ”
> After a second thought, I think maybe keeping INCONSISTENT just for the
> modify error is a better option.
>
> With current patch:
> 1.
> the modify error case:
> OPERATIONAL -> INCONSISTENT ->  EVICTING -> EVICTED
> 2.
> the delete error case:
> EVICTING -> EVICTED
>
> Change both to INCONSISTENT:
>
> the modify error case:
> did not change.
>
> the delete error case:
> EVICTING -> INCONSISTENT -> EVICTED?
>
> “
>
> And we agree to take the second solution.

I know, but going over the state meanings again, UKEY_EVICTING means the 
following:

 /* Ukey is in umap, datapath flow delete is queued. */

Which now no longer is the case, so should a new state not make more sense?

Any one else has some input on this??

> Eelco Chaudron <[email protected]> 于2022年12月8日周四 18:54写道:
>
>>
>>
>> On 27 Nov 2022, at 8:28, Peng He wrote:
>>
>>> push_dp_ops only handles delete ops errors but ignores the modify
>>> ops results. It's better to handle all the dp operation errors in
>>> a consistent way.
>>>
>>> We observe in the production environment that sometimes a megaflow
>>> with wrong actions keep staying in datapath. The coverage command shows
>>> revalidators have dumped several times, however the correct
>>> actions are not set. This implies that the ukey's action does not
>>> equal to the meagaflow's, i.e. revalidators think the underlying
>>> megaflow's actions are correct however they are not.
>>>
>>> We also check the megaflow using the ofproto/trace command, and the
>>> actions are not matched with the ones in the actual magaflow. By
>>> performing a revalidator/purge command, the right actions are set.
>>>
>>> This patch prevents the inconsistency by considering modify failure
>>> in revalidators.
>>>
>>> To note, we cannot perform two state transitions and change ukey_state
>>> into UKEY_EVICTED directly here, because, if we do so, the
>>> sweep will remove the ukey alone and leave dp flow alive. Later, the
>>> dump will retrieve the dp flow and might even recover it. This will
>>> contribute the stats of this dp flow twice.
>>>
>>> Signed-off-by: Peng He <[email protected]>
>>> ---
>>>  ofproto/ofproto-dpif-upcall.c | 34 +++++++++++++++++++++++-----------
>>>  1 file changed, 23 insertions(+), 11 deletions(-)
>>>
>>> diff --git a/ofproto/ofproto-dpif-upcall.c
>> b/ofproto/ofproto-dpif-upcall.c
>>> index 7ad728adf..c2cefbeb8 100644
>>> --- a/ofproto/ofproto-dpif-upcall.c
>>> +++ b/ofproto/ofproto-dpif-upcall.c
>>> @@ -2416,26 +2416,30 @@ push_dp_ops(struct udpif *udpif, struct ukey_op
>> *ops, size_t n_ops)
>>>
>>>      for (i = 0; i < n_ops; i++) {
>>>          struct ukey_op *op = &ops[i];
>>> -        struct dpif_flow_stats *push, *stats, push_buf;
>>> -
>>> -        stats = op->dop.flow_del.stats;
>>> -        push = &push_buf;
>>> -
>>> -        if (op->dop.type != DPIF_OP_FLOW_DEL) {
>>> -            /* Only deleted flows need their stats pushed. */
>>> -            continue;
>>> -        }
>>>
>>>          if (op->dop.error) {
>>> -            /* flow_del error, 'stats' is unusable. */
>>>              if (op->ukey) {
>>>                  ovs_mutex_lock(&op->ukey->mutex);
>>> -                transition_ukey(op->ukey, UKEY_EVICTED);
>>> +                if (op->dop.type == DPIF_OP_FLOW_DEL) {
>>> +                    transition_ukey(op->ukey, UKEY_EVICTED);
>>> +                } else {

I think we could use a comment here to make sure why we set it to evicting. 
Maybe just a reference to the comment in revalidator_sweep__() might be enough.

>>> +                    transition_ukey(op->ukey, UKEY_EVICTING);
>>> +                }
>>>                  ovs_mutex_unlock(&op->ukey->mutex);
>>>              }
>>>              continue;
>>>          }
>>>
>>> +        if (op->dop.type != DPIF_OP_FLOW_DEL) {
>>> +            /* Only deleted flows need their stats pushed. */
>>> +            continue;
>>> +        }
>>> +
>>> +        struct dpif_flow_stats *push, *stats, push_buf;
>>> +
>>> +        stats = op->dop.flow_del.stats;
>>> +        push = &push_buf;
>>> +
>>>          if (op->ukey) {
>>>              ovs_mutex_lock(&op->ukey->mutex);
>>>              transition_ukey(op->ukey, UKEY_EVICTED);
>>> @@ -2848,6 +2852,14 @@ revalidator_sweep__(struct revalidator
>> *revalidator, bool purge)
>>>                  continue;
>>>              }
>>>              ukey_state = ukey->state;
>>> +
>>> +            if (ukey_state == UKEY_EVICTING) {
>>> +                /* previous modify operation fails on this ukey and
>> ukey_state
>>> +                 * is set to UKEY_EVICTING, issue a delete operation on
>> this
>>> +                 * ukey.
>>> +                 */
>>> +                delete_op_init(udpif, &ops[n_ops++], ukey);

This would cause the later push_ukey_ops() to call ukey_delete() which will 
call transition_ukey(ukey, UKEY_DELETED).
Which I think will result in an invalid state transaction from UKEY_EVICTING to 
UKEY_DELETED.

>>
>> How can we be sure this state here is only for failing to modify
>> operations?
>> This state is set in other places in the code also.
>>
>
> because revalidate_sweep is called after revaldiate. all the datapath ops
> have already been
> pushed, and if there are ukeys at UKEY_EVICTING, it's because previously
> datapath ops
> have errors. and if del ops fail, ukey is deleted. so only ukeys at
> UKEY_EVICTING must
> come from modify fail.

First of all, I missed that p_purge_cb() was pausing the revalidators, so you 
are right that there is no other way to set UKEY_EVICTING other than the sweet 
phase.

I also missed the part that the ukey_delete() will do the actual removal, and 
not the if (ukey_state == UKEY_EVICTED) statement lower.

Maybe we should reconsider having a special state? Anyhow, I do think the 
comment in the code needs some changing to make this clear:


            if (ukey_state == UKEY_EVICTING) {
                /* A previous modify operation in push_dp_ops() failed and
                 * caused the state to be UKEY_EVICTING. We need to manually
                 * push a delete operation on this ukey to get both the
                 * datapath flow and ukey removed. The later push_ukey_ops()
                 * will call  push_dp_ops() and ukey_delete() to accomplish
                 * this. */
                delete_op_init(udpif, &ops[n_ops++], ukey);
            }

>> Maybe we should keep the INCONSISTENT state to avoid this (see v5)?
>>

What about the case the push_dp_ops()/push_ukey_ops() fails in 
revalidator_sweep__() for a none DELETE action, who will take care of this?
The alternative fix might be better in this case.

>>> +            }
>>>              if (ukey_state == UKEY_OPERATIONAL
>>>                  || (ukey_state == UKEY_VISIBLE && purge)) {
>>>                  struct recirc_refs recircs =
>> RECIRC_REFS_EMPTY_INITIALIZER;
>>> --
>>> 2.25.1
>>
>>
>
> -- 
> hepeng

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to