On 7/31/23 07:34, Mike Pattrick wrote:
> On Fri, Jul 28, 2023 at 6:31 PM Ilya Maximets <[email protected]> wrote:
>>
>> On 7/28/23 23:13, Mike Pattrick wrote:
>>> On Fri, Jul 28, 2023 at 7:48 AM Ilya Maximets <[email protected]
>>> <mailto:[email protected]>> wrote:
>>>
>>> On 7/20/23 17:06, Mike Pattrick wrote:
>>> > When the a revalidator thread is updating statistics for an XC_LEARN
>>> > xcache entry in xlate_push_stats_entry it uses ofproto_flow_mod_learn.
>>> > The revalidator will update stats for rules even if they are in a
>>> > removed state or marked as invisible. However, ofproto_flow_mod_learn
>>> > will detect if a flow has been removed and re-add it in that case.
>>> This
>>> > can result in an old learn action replacing the new learn action that
>>> > had replaced it in the first place.
>>> >
>>> > This change adds a new parameter to ofproto_flow_mod_learn allowing
>>> the
>>> > caller to specify an action to take if temp_rule is removed. If it is
>>> > removed and the caller is xlate_push_stats_entry, then a new rule will
>>> > not be replaced.
>>>
>>>
>>> This is a slightly unrelated point, but while reviewing some code and test
>>> results I think I found a different issue. rule_expire() considers rule
>>> modified time for hard_timeout and the man page says hard_timeout "Causes
>>> the flow to expire after the given number of seconds, regardless of
>>> activity",
>>> however, modified time is updated in ofproto_flow_mod_learn_refresh() even
>>> if
>>> the rule isn't changed. Due to the current behaviour, I don't think
>>> hard_timeout
>>> could actually ever trigger on a learned action if an idle timeout is set
>>> or if
>>> traffic is ongoing.
>>
>> The 'modified' time of a learned rule is updated in this function, because
>> the packet hit the learn() rule. If the learned rule didn't exist, we
>> would create it at this moment. Instead of re-creating, we just update
>> the 'modified' time. If packets will stop hitting the learn() rule, but
>> will continue to flow through the learned one, the learned one will be
>> removed when the hard timeout fires up. That is a correct behavior.
>>
>
> I can accept this is intended, but it doesn't seem like it would
> operate in that way based on the man page alone.
It's a generic characteristic of an OpenFlow rule. "activity" means traffic
matching it. learn() action is triggering a flow modification equal to a
strict mod-flow, as described in the ovs-actions man page. And ovs-ofctl page
mentions that "mod-flows restarts the hard_timeout".
Maybe it can be clarified somehow better, I'm not sure. Feel free to submit
a patch.
>
>>
>>>
>>>
>>> Hi, Mike. That's an interesting issue, though I'm not sure that is a
>>> correct way to fix it.
>>>
>>> It is important to re-install removed learned flows during stats update.
>>> The logic is the following, AFAIU:
>>>
>>> 1. Packet hits the OF rule (1) with a learning action.
>>> 2. Handler translates OF and creates a datapath flow for it, creating
>>> a learned rule (2) in the process and filling the translation cache.
>>> 3. Datapath flow for the rule (1) is installed into the datapath.
>>> 4. Time goes by and timeout on the learned rule (2) hits. Learned rule
>>> is removed now.
>>> 5. A packet hits the datapath flow for rule (1).
>>> 6. Revalidator updates flow stats for the rule (1).
>>> 7. Since the packet hit the learning rule (1), the rule (2) has to be
>>> added back.
>>>
>>> IIUC, the step 7 will not happen with the change proposed in this patch.
>>> Flow will not be re-translated, so the only way to re-install it is
>>> to execute side effects when we see stats update on a parent datapath
>>> flow in the revalidator.
>>>
>>> One potential way to avoid breaking this scenario is to check the
>>> removal reason, but it's not bulletproof either as we do still need
>>> to reinstate the flow even if it was manually deleted or evicted.
>>>
>>>
>>> I added a log to check the removal reason in the XC_LEARN stats update, and
>>> the
>>> reason always seems to be eviction. I wasn't able to catch an idle or hard
>>> removal
>>> in that function, though I'm sure I just wasn't winning the race.
>>>
>>> But in this example if we hit 7, and don't re-add the rule, what is the
>>> real penalty?
>>> The next packet will have to upcall and we would process the learn action
>>> from scratch,
>>> but that may be what the user wants as indicated by setting a timeout.
>>
>> Not really. If the learned flow is removed, we need a packet to hit a
>> learn() rule, which is a different rule, in order for traffic to continue
>> to flow. Let's say we implement a firewall that allows connections only
>> from A to B. So, A can initiate connection to B, but B can't initiate
>> connection to A. Once A sends tcp SYN, we learn a rule that allows traffic
>> from B to A on these particular ports. Now B can send traffic back.
>> Once the learned flow is removed, A will have to establish a new connection
>> in order to re-learn the B->A rule. So, not just an upcall. May not be a
>> best example, but should work as an illustration.
>>
>>> The alternative
>>> is we re-install a flow that may be different then the flow that the user
>>> actually
>>> wants installed.
>>
>> If the flow is modified, it means that there was a traffic on a learn()
>> action and we have to learn that specific rule. If it was already removed,
>> we have to reinstate it. In general, the logic in the current OVS code
>> seems correct in this particular place.
>>
>>>
>>> Isn't there a risk of different revalidator threads going out of sync with
>>> different
>>> versions of a learned rule in their xcache?
>>>
>>>
>>> It also seems like we may hold the reference to this deleted rule for a
>>> long time in the cache until the parent datapath flow is deleted.
>>>
>>>
>>> What is triggering the deletion of the learned rule in your case?
>>> Is it deletion or modification of the parent rule? And does it have
>>> 'delete_learned' flag?
>>>
>>>
>>> In the case that I was debugging, the parent learn action was not deleted.
>>> Instead,
>>> new traffic from a different port was supposed to cause the learn action to
>>> modify
>>> the existing rule. But instead we see the rule modified with the correct
>>> information,
>>> and then immediately reverted with old stale information.
>>
>> The problem here is that we can't define which one is 'correct'.
>> It's a race where both values are correct, but you like one better
>> for some reason. New rules are only learned if the parent datapath
>> flow that has a learn() action seen packets after the previous stats
>> check.
>>
>> What you're describing is that you have a single rule with a learn
>> action that is capable of re-writing the same learned rule depending
>> on what kind of packet actually hits it. Is that correct?
>
> In this case the rule is:
>
> learn(table=20,hard_timeout=300,priority=1,cookie=0xcb684d24d14b14ad,NXM_OF_VLAN_TCI[0..11],NXM_OF_ETH_DST[]=NXM_OF_ETH_SRC[],load:0->NXM_OF_VLAN_TCI[],load:NXM_NX_TUN_ID[]->NXM_NX_TUN_ID[],output:OXM_OF_IN_PORT[]),output:1
>
> And an example of the learned rule is:
>
> vlan_tci=0x0005/0x0fff,dl_dst=fc:16:3e:3e:5d:e0,actions=load:0->NXM_OF_VLAN_TCI[],load:0x3->NXM_NX_TUN_ID[],ou
> tput:2
>
> The scenario is that ingressing packets are supposed to change the
> route that egress packets take, and that route can change over time.
>
> What I see from some additional logging is:
>
> 1. traffic begins to ingress on port A
> 2. learn action installs a rule to route response to A
> 3. traffic begins to egress on port A
> 4. port A becomes invalid, no further traffic ingresses from port A
> 5. traffic ingresses on port B
> 6. learn action installs a rule to route response to B
> 7. revalidator replaces (6) with (2)
>
> Eventually the egressing traffic will become exausted and the correct
> rule is installed, but this can take several minutes. The rule
> regression can even happen many milliseconds after learn installs the
> correct rule and even after packets have flowed through the new rule.
The rule can be replaced if there was no revalidation between steps
4 and 5, i.e. at the point of revalidation there is a stats increase
on both datapath flows (ingress A and ingress B).
Revalidators can sleep for up to 500ms by default between revalidation
cycles, so the replacement can happen many milliseconds after.
Is the traffic on port B continuous, or is it a one-shot packet?
The point is, at least one revalidation cycle should pass after step 4.
And the traffic on port B should be seen after this revalidation cycle
is completed. So, step 5 should be at least 500+ ms apart from step 4.
>
>
>> I would consider this kind of setup as fairly fragile and would not
>> recommend using it in general.
>>
>> However, the scenario is likely like this:
>>
>> 1. Upcall on a rule with learn() action.
>> New datapath flow (1) is installed. This flow has a match on
>> all the fields used in a learn action and ukey contains the
>> translation cache with a rule that it produces.
>> The new learned rule is installed from this.
>>
>> 2. Upcall with a different packet that doesn't match the first
>> datapath flow (1), but on the same rule with a learn() action.
>> New datapath flow (2) is installed. This flow has a match on
>> all the fields used in a learn action (they are different from
>> the first one, since we have an upcall) and ukey contains the
>> translation cache with a rule that it produces. Let's say the
>> learned rule has the same match, but different actions.
>> This overwrites the previously installed learned rule.
>>
>> 3. Both datapath flows (1) and (2) get some traffic.
>>
>> 4. Revalidation starts.
>>
>> 5. Revalidator A dumps datapath flow (1).
>> Revalidator B dumps datapath flow (2).
>>
>> 6. Revalidators A and B are trying to revalidate these datapath flows
>> at the same time. Both datapath flows had traffic since the
>> last revalidation. Both revalidators are trying to update the
>> learned rule, which is the same for both of them.
>>
>> 7. One of them does that first, the other one ends up with a rule in a
>> REMOVED state, since it was just replaced by the other revalidator.
>>
>>
>> There is no correct or incorrect rule in this situation. Both
>> of them are correct, because both datapath flows have seen packets
>> since the previous revalidation. We can't guarantee which one will
>> end up being written the last and therefore will take effect.
>> if we refuse to reinstate the REMOVED rule, we still can't say which
>> one will be active in the end.
>>
>> What can resolve the race is the next revalidation cycle.
>> Once one of the datapath flows will stop receiving traffic, there
>> should no longer be a race condition and the next revalidation will
>> set the "correct" rule, i.e. the rule that corresponds to a learning
>> datapath flow that seen some packets last. But until both flows have
>> traffic, there is no "correct" rule and the result will be random,
>> regardless of what we choose to do.
>>
>>
>> One issue that I see in the current code is that it looks like the table
>> version is not getting updated on learned flow refresh and hence the next
>> revalidation may not actually be triggered. In this case the random
>> result may stay for much longer than we would like.
>>
>> The suspicious code (and the potential fix) is the following:
>>
>> @@ -5548,22 +5558,15 @@ ofproto_flow_mod_learn_revert(struct
>> ofproto_flow_mod *ofm)
>> }
>>
>> enum ofperr
>> -ofproto_flow_mod_learn_finish(struct ofproto_flow_mod *ofm,
>> - struct ofproto *orig_ofproto)
>> +ofproto_flow_mod_learn_finish(struct ofproto_flow_mod *ofm)
>> OVS_REQUIRES(ofproto_mutex)
>> {
>> struct rule *rule = rule_collection_rules(&ofm->new_rules)[0];
>> enum ofperr error = 0;
>>
>> - /* If learning on a different bridge, must bump its version
>> - * number and flush connmgr afterwards. */
>> - if (rule->ofproto != orig_ofproto) {
>> - ofproto_bump_tables_version(rule->ofproto);
>> - }
>> + ofproto_bump_tables_version(rule->ofproto);
>> error = ofproto_flow_mod_finish(rule->ofproto, ofm, NULL);
>> - if (rule->ofproto != orig_ofproto) {
>> - ofmonitor_flush(rule->ofproto->connmgr);
>> - }
>> + ofmonitor_flush(rule->ofproto->connmgr);
>>
>> return error;
>> }
>> ---
>>
>> We should actually bump the version and request a revalidation in
>> any case, not only when the change is in a different bridge.
FWIW, the original code in commit 1f4a89336682 was written mostly
for a case where learning is triggered by a packet-out executed
within a bundle. And we can't actually publish changes for a current
bridge in this case. So, the fix should be a bit more clever.
>>
>> Could you try that in your setup?
>
> That was actually my initial suspicion when investigating this bug,
> but this change doesn't improve the buggy behaviour.
>
> Thank you,
> M
>
>>
>>
>> The issue appears to come from the commit:
>> 1f4a89336682 ("ofproto: Refactor packet_out handling.")
>>
>> There is a slight chance of launching ourselves into an infinite
>> revlidation loop, I guess. Especially if both learning datapath
>> flows will continue to receive the traffic, but there is a minimal
>> revalidation interval, IIRC. Not sure if it applies in this case.
>>
>> Best regards, Ilya Maximets.
>>
>
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev