Re: [ovs-dev] [PATCH] ofproto-dpif-xlate: Don't reinstall removed XC_LEARN rule

2023-07-31 Thread Ilya Maximets
On 7/31/23 18:01, Mike Pattrick wrote:
> On Mon, Jul 31, 2023 at 7:20 AM Ilya Maximets  wrote:
>>
>> On 7/31/23 07:34, Mike Pattrick wrote:
>>> On Fri, Jul 28, 2023 at 6:31 PM Ilya Maximets  wrote:

 On 7/28/23 23:13, Mike Pattrick wrote:
> On Fri, Jul 28, 2023 at 7:48 AM Ilya Maximets  > wrote:
>
> On 7/20/23 17:06, Mike Pattrick wrote:
> > When the a revalidator thread is updating statistics for an XC_LEARN
> > xcache entry in xlate_push_stats_entry it uses 
> ofproto_flow_mod_learn.
> > The revalidator will update stats for rules even if they are in a
> > removed state or marked as invisible. However, 
> ofproto_flow_mod_learn
> > will detect if a flow has been removed and re-add it in that case. 
> This
> > can result in an old learn action replacing the new learn action 
> that
> > had replaced it in the first place.
> >
> > This change adds a new parameter to ofproto_flow_mod_learn allowing 
> the
> > caller to specify an action to take if temp_rule is removed. If it 
> is
> > removed and the caller is xlate_push_stats_entry, then a new rule 
> will
> > not be replaced.
>
>
> This is a slightly unrelated point, but while reviewing some code and test
> results I think I found a different issue. rule_expire() considers rule
> modified time for hard_timeout and the man page says hard_timeout "Causes
> the flow to expire after the given number of seconds, regardless of 
> activity",
> however, modified time is updated in ofproto_flow_mod_learn_refresh() 
> even if
> the rule isn't changed. Due to the current behaviour, I don't think 
> hard_timeout
> could actually ever trigger on a learned action if an idle timeout is set 
> or if
> traffic is ongoing.

 The 'modified' time of a learned rule is updated in this function, because
 the packet hit the learn() rule.  If the learned rule didn't exist, we
 would create it at this moment.  Instead of re-creating, we just update
 the 'modified' time.  If packets will stop hitting the learn() rule, but
 will continue to flow through the learned one, the learned one will be
 removed when the hard timeout fires up.  That is a correct behavior.

>>>
>>> I can accept this is intended, but it doesn't seem like it would
>>> operate in that way based on the man page alone.
>>
>> It's a generic characteristic of an OpenFlow rule.  "activity" means traffic
>> matching it.  learn() action is triggering a flow modification equal to a
>> strict mod-flow, as described in the ovs-actions man page.  And ovs-ofctl 
>> page
>> mentions that "mod-flows restarts the hard_timeout".
>>
>> Maybe it can be clarified somehow better, I'm not sure.  Feel free to submit
>> a patch.
>>
>>>

>
>
> Hi, Mike.  That's an interesting issue, though I'm not sure that is a
> correct way to fix it.
>
> It is important to re-install removed learned flows during stats 
> update.
> The logic is the following, AFAIU:
>
> 1. Packet hits the OF rule (1) with a learning action.
> 2. Handler translates OF and creates a datapath flow for it, creating
>a learned rule (2) in the process and filling the translation 
> cache.
> 3. Datapath flow for the rule (1) is installed into the datapath.
> 4. Time goes by and timeout on the learned rule (2) hits. Learned rule
>is removed now.
> 5. A packet hits the datapath flow for rule (1).
> 6. Revalidator updates flow stats for the rule (1).
> 7. Since the packet hit the learning rule (1), the rule (2) has to be
>added back.
>
> IIUC, the step 7 will not happen with the change proposed in this 
> patch.
> Flow will not be re-translated, so the only way to re-install it is
> to execute side effects when we see stats update on a parent datapath
> flow in the revalidator.
>
> One potential way to avoid breaking this scenario is to check the
> removal reason, but it's not bulletproof either as we do still need
> to reinstate the flow even if it was manually deleted or evicted.
>
>
> I added a log to check the removal reason in the XC_LEARN stats update, 
> and the
> reason always seems to be eviction. I wasn't able to catch an idle or 
> hard removal
> in that function, though I'm sure I just wasn't winning the race.
>
> But in this example if we hit 7, and don't re-add the rule, what is the 
> real penalty?
> The next packet will have to upcall and we would process the learn action 
> from scratch,
> but that may be what the user wants as indicated by setting a timeout.

 Not really.  If the learned flow is removed, we need a packet to 

Re: [ovs-dev] [PATCH] ofproto-dpif-xlate: Don't reinstall removed XC_LEARN rule

2023-07-31 Thread Mike Pattrick
On Mon, Jul 31, 2023 at 7:20 AM Ilya Maximets  wrote:
>
> On 7/31/23 07:34, Mike Pattrick wrote:
> > On Fri, Jul 28, 2023 at 6:31 PM Ilya Maximets  wrote:
> >>
> >> On 7/28/23 23:13, Mike Pattrick wrote:
> >>> On Fri, Jul 28, 2023 at 7:48 AM Ilya Maximets  >>> > wrote:
> >>>
> >>> On 7/20/23 17:06, Mike Pattrick wrote:
> >>> > When the a revalidator thread is updating statistics for an XC_LEARN
> >>> > xcache entry in xlate_push_stats_entry it uses 
> >>> ofproto_flow_mod_learn.
> >>> > The revalidator will update stats for rules even if they are in a
> >>> > removed state or marked as invisible. However, 
> >>> ofproto_flow_mod_learn
> >>> > will detect if a flow has been removed and re-add it in that case. 
> >>> This
> >>> > can result in an old learn action replacing the new learn action 
> >>> that
> >>> > had replaced it in the first place.
> >>> >
> >>> > This change adds a new parameter to ofproto_flow_mod_learn allowing 
> >>> the
> >>> > caller to specify an action to take if temp_rule is removed. If it 
> >>> is
> >>> > removed and the caller is xlate_push_stats_entry, then a new rule 
> >>> will
> >>> > not be replaced.
> >>>
> >>>
> >>> This is a slightly unrelated point, but while reviewing some code and test
> >>> results I think I found a different issue. rule_expire() considers rule
> >>> modified time for hard_timeout and the man page says hard_timeout "Causes
> >>> the flow to expire after the given number of seconds, regardless of 
> >>> activity",
> >>> however, modified time is updated in ofproto_flow_mod_learn_refresh() 
> >>> even if
> >>> the rule isn't changed. Due to the current behaviour, I don't think 
> >>> hard_timeout
> >>> could actually ever trigger on a learned action if an idle timeout is set 
> >>> or if
> >>> traffic is ongoing.
> >>
> >> The 'modified' time of a learned rule is updated in this function, because
> >> the packet hit the learn() rule.  If the learned rule didn't exist, we
> >> would create it at this moment.  Instead of re-creating, we just update
> >> the 'modified' time.  If packets will stop hitting the learn() rule, but
> >> will continue to flow through the learned one, the learned one will be
> >> removed when the hard timeout fires up.  That is a correct behavior.
> >>
> >
> > I can accept this is intended, but it doesn't seem like it would
> > operate in that way based on the man page alone.
>
> It's a generic characteristic of an OpenFlow rule.  "activity" means traffic
> matching it.  learn() action is triggering a flow modification equal to a
> strict mod-flow, as described in the ovs-actions man page.  And ovs-ofctl page
> mentions that "mod-flows restarts the hard_timeout".
>
> Maybe it can be clarified somehow better, I'm not sure.  Feel free to submit
> a patch.
>
> >
> >>
> >>>
> >>>
> >>> Hi, Mike.  That's an interesting issue, though I'm not sure that is a
> >>> correct way to fix it.
> >>>
> >>> It is important to re-install removed learned flows during stats 
> >>> update.
> >>> The logic is the following, AFAIU:
> >>>
> >>> 1. Packet hits the OF rule (1) with a learning action.
> >>> 2. Handler translates OF and creates a datapath flow for it, creating
> >>>a learned rule (2) in the process and filling the translation 
> >>> cache.
> >>> 3. Datapath flow for the rule (1) is installed into the datapath.
> >>> 4. Time goes by and timeout on the learned rule (2) hits. Learned rule
> >>>is removed now.
> >>> 5. A packet hits the datapath flow for rule (1).
> >>> 6. Revalidator updates flow stats for the rule (1).
> >>> 7. Since the packet hit the learning rule (1), the rule (2) has to be
> >>>added back.
> >>>
> >>> IIUC, the step 7 will not happen with the change proposed in this 
> >>> patch.
> >>> Flow will not be re-translated, so the only way to re-install it is
> >>> to execute side effects when we see stats update on a parent datapath
> >>> flow in the revalidator.
> >>>
> >>> One potential way to avoid breaking this scenario is to check the
> >>> removal reason, but it's not bulletproof either as we do still need
> >>> to reinstate the flow even if it was manually deleted or evicted.
> >>>
> >>>
> >>> I added a log to check the removal reason in the XC_LEARN stats update, 
> >>> and the
> >>> reason always seems to be eviction. I wasn't able to catch an idle or 
> >>> hard removal
> >>> in that function, though I'm sure I just wasn't winning the race.
> >>>
> >>> But in this example if we hit 7, and don't re-add the rule, what is the 
> >>> real penalty?
> >>> The next packet will have to upcall and we would process the learn action 
> >>> from scratch,
> >>> but that may be what the user wants as indicated by setting a timeout.
> >>
> >> Not really.  If the learned flow is removed, we need a packet to hit a
> >> learn() rule, which is a different rule, in 

Re: [ovs-dev] [PATCH] ofproto-dpif-xlate: Don't reinstall removed XC_LEARN rule

2023-07-31 Thread Ilya Maximets
On 7/31/23 07:34, Mike Pattrick wrote:
> On Fri, Jul 28, 2023 at 6:31 PM Ilya Maximets  wrote:
>>
>> On 7/28/23 23:13, Mike Pattrick wrote:
>>> On Fri, Jul 28, 2023 at 7:48 AM Ilya Maximets >> > wrote:
>>>
>>> On 7/20/23 17:06, Mike Pattrick wrote:
>>> > When the a revalidator thread is updating statistics for an XC_LEARN
>>> > xcache entry in xlate_push_stats_entry it uses ofproto_flow_mod_learn.
>>> > The revalidator will update stats for rules even if they are in a
>>> > removed state or marked as invisible. However, ofproto_flow_mod_learn
>>> > will detect if a flow has been removed and re-add it in that case. 
>>> This
>>> > can result in an old learn action replacing the new learn action that
>>> > had replaced it in the first place.
>>> >
>>> > This change adds a new parameter to ofproto_flow_mod_learn allowing 
>>> the
>>> > caller to specify an action to take if temp_rule is removed. If it is
>>> > removed and the caller is xlate_push_stats_entry, then a new rule will
>>> > not be replaced.
>>>
>>>
>>> This is a slightly unrelated point, but while reviewing some code and test
>>> results I think I found a different issue. rule_expire() considers rule
>>> modified time for hard_timeout and the man page says hard_timeout "Causes
>>> the flow to expire after the given number of seconds, regardless of 
>>> activity",
>>> however, modified time is updated in ofproto_flow_mod_learn_refresh() even 
>>> if
>>> the rule isn't changed. Due to the current behaviour, I don't think 
>>> hard_timeout
>>> could actually ever trigger on a learned action if an idle timeout is set 
>>> or if
>>> traffic is ongoing.
>>
>> The 'modified' time of a learned rule is updated in this function, because
>> the packet hit the learn() rule.  If the learned rule didn't exist, we
>> would create it at this moment.  Instead of re-creating, we just update
>> the 'modified' time.  If packets will stop hitting the learn() rule, but
>> will continue to flow through the learned one, the learned one will be
>> removed when the hard timeout fires up.  That is a correct behavior.
>>
> 
> I can accept this is intended, but it doesn't seem like it would
> operate in that way based on the man page alone.

It's a generic characteristic of an OpenFlow rule.  "activity" means traffic
matching it.  learn() action is triggering a flow modification equal to a
strict mod-flow, as described in the ovs-actions man page.  And ovs-ofctl page
mentions that "mod-flows restarts the hard_timeout".

Maybe it can be clarified somehow better, I'm not sure.  Feel free to submit
a patch.

> 
>>
>>>
>>>
>>> Hi, Mike.  That's an interesting issue, though I'm not sure that is a
>>> correct way to fix it.
>>>
>>> It is important to re-install removed learned flows during stats update.
>>> The logic is the following, AFAIU:
>>>
>>> 1. Packet hits the OF rule (1) with a learning action.
>>> 2. Handler translates OF and creates a datapath flow for it, creating
>>>a learned rule (2) in the process and filling the translation cache.
>>> 3. Datapath flow for the rule (1) is installed into the datapath.
>>> 4. Time goes by and timeout on the learned rule (2) hits. Learned rule
>>>is removed now.
>>> 5. A packet hits the datapath flow for rule (1).
>>> 6. Revalidator updates flow stats for the rule (1).
>>> 7. Since the packet hit the learning rule (1), the rule (2) has to be
>>>added back.
>>>
>>> IIUC, the step 7 will not happen with the change proposed in this patch.
>>> Flow will not be re-translated, so the only way to re-install it is
>>> to execute side effects when we see stats update on a parent datapath
>>> flow in the revalidator.
>>>
>>> One potential way to avoid breaking this scenario is to check the
>>> removal reason, but it's not bulletproof either as we do still need
>>> to reinstate the flow even if it was manually deleted or evicted.
>>>
>>>
>>> I added a log to check the removal reason in the XC_LEARN stats update, and 
>>> the
>>> reason always seems to be eviction. I wasn't able to catch an idle or hard 
>>> removal
>>> in that function, though I'm sure I just wasn't winning the race.
>>>
>>> But in this example if we hit 7, and don't re-add the rule, what is the 
>>> real penalty?
>>> The next packet will have to upcall and we would process the learn action 
>>> from scratch,
>>> but that may be what the user wants as indicated by setting a timeout.
>>
>> Not really.  If the learned flow is removed, we need a packet to hit a
>> learn() rule, which is a different rule, in order for traffic to continue
>> to flow.  Let's say we implement a firewall that allows connections only
>> from A to B.  So, A can initiate connection to B, but B can't initiate
>> connection to A.  Once A sends tcp SYN, we learn a rule that allows traffic
>> from B to A on these particular ports.  Now B 

Re: [ovs-dev] [PATCH] ofproto-dpif-xlate: Don't reinstall removed XC_LEARN rule

2023-07-30 Thread Mike Pattrick
On Fri, Jul 28, 2023 at 6:31 PM Ilya Maximets  wrote:
>
> On 7/28/23 23:13, Mike Pattrick wrote:
> > On Fri, Jul 28, 2023 at 7:48 AM Ilya Maximets  > > wrote:
> >
> > On 7/20/23 17:06, Mike Pattrick wrote:
> > > When the a revalidator thread is updating statistics for an XC_LEARN
> > > xcache entry in xlate_push_stats_entry it uses ofproto_flow_mod_learn.
> > > The revalidator will update stats for rules even if they are in a
> > > removed state or marked as invisible. However, ofproto_flow_mod_learn
> > > will detect if a flow has been removed and re-add it in that case. 
> > This
> > > can result in an old learn action replacing the new learn action that
> > > had replaced it in the first place.
> > >
> > > This change adds a new parameter to ofproto_flow_mod_learn allowing 
> > the
> > > caller to specify an action to take if temp_rule is removed. If it is
> > > removed and the caller is xlate_push_stats_entry, then a new rule will
> > > not be replaced.
> >
> >
> > This is a slightly unrelated point, but while reviewing some code and test
> > results I think I found a different issue. rule_expire() considers rule
> > modified time for hard_timeout and the man page says hard_timeout "Causes
> > the flow to expire after the given number of seconds, regardless of 
> > activity",
> > however, modified time is updated in ofproto_flow_mod_learn_refresh() even 
> > if
> > the rule isn't changed. Due to the current behaviour, I don't think 
> > hard_timeout
> > could actually ever trigger on a learned action if an idle timeout is set 
> > or if
> > traffic is ongoing.
>
> The 'modified' time of a learned rule is updated in this function, because
> the packet hit the learn() rule.  If the learned rule didn't exist, we
> would create it at this moment.  Instead of re-creating, we just update
> the 'modified' time.  If packets will stop hitting the learn() rule, but
> will continue to flow through the learned one, the learned one will be
> removed when the hard timeout fires up.  That is a correct behavior.
>

I can accept this is intended, but it doesn't seem like it would
operate in that way based on the man page alone.

>
> >
> >
> > Hi, Mike.  That's an interesting issue, though I'm not sure that is a
> > correct way to fix it.
> >
> > It is important to re-install removed learned flows during stats update.
> > The logic is the following, AFAIU:
> >
> > 1. Packet hits the OF rule (1) with a learning action.
> > 2. Handler translates OF and creates a datapath flow for it, creating
> >a learned rule (2) in the process and filling the translation cache.
> > 3. Datapath flow for the rule (1) is installed into the datapath.
> > 4. Time goes by and timeout on the learned rule (2) hits. Learned rule
> >is removed now.
> > 5. A packet hits the datapath flow for rule (1).
> > 6. Revalidator updates flow stats for the rule (1).
> > 7. Since the packet hit the learning rule (1), the rule (2) has to be
> >added back.
> >
> > IIUC, the step 7 will not happen with the change proposed in this patch.
> > Flow will not be re-translated, so the only way to re-install it is
> > to execute side effects when we see stats update on a parent datapath
> > flow in the revalidator.
> >
> > One potential way to avoid breaking this scenario is to check the
> > removal reason, but it's not bulletproof either as we do still need
> > to reinstate the flow even if it was manually deleted or evicted.
> >
> >
> > I added a log to check the removal reason in the XC_LEARN stats update, and 
> > the
> > reason always seems to be eviction. I wasn't able to catch an idle or hard 
> > removal
> > in that function, though I'm sure I just wasn't winning the race.
> >
> > But in this example if we hit 7, and don't re-add the rule, what is the 
> > real penalty?
> > The next packet will have to upcall and we would process the learn action 
> > from scratch,
> > but that may be what the user wants as indicated by setting a timeout.
>
> Not really.  If the learned flow is removed, we need a packet to hit a
> learn() rule, which is a different rule, in order for traffic to continue
> to flow.  Let's say we implement a firewall that allows connections only
> from A to B.  So, A can initiate connection to B, but B can't initiate
> connection to A.  Once A sends tcp SYN, we learn a rule that allows traffic
> from B to A on these particular ports.  Now B can send traffic back.
> Once the learned flow is removed, A will have to establish a new connection
> in order to re-learn the B->A rule.  So, not just an upcall.  May not be a
> best example, but should work as an illustration.
>
> > The alternative
> > is we re-install a flow that may be different then the flow that the user 
> > actually
> > wants installed.
>
> If the flow is modified, it means that there was a traffic on a learn()

Re: [ovs-dev] [PATCH] ofproto-dpif-xlate: Don't reinstall removed XC_LEARN rule

2023-07-28 Thread Ilya Maximets
On 7/28/23 23:13, Mike Pattrick wrote:
> On Fri, Jul 28, 2023 at 7:48 AM Ilya Maximets  > wrote:
> 
> On 7/20/23 17:06, Mike Pattrick wrote:
> > When the a revalidator thread is updating statistics for an XC_LEARN
> > xcache entry in xlate_push_stats_entry it uses ofproto_flow_mod_learn.
> > The revalidator will update stats for rules even if they are in a
> > removed state or marked as invisible. However, ofproto_flow_mod_learn
> > will detect if a flow has been removed and re-add it in that case. This
> > can result in an old learn action replacing the new learn action that
> > had replaced it in the first place.
> >
> > This change adds a new parameter to ofproto_flow_mod_learn allowing the
> > caller to specify an action to take if temp_rule is removed. If it is
> > removed and the caller is xlate_push_stats_entry, then a new rule will
> > not be replaced.
> 
>  
> This is a slightly unrelated point, but while reviewing some code and test
> results I think I found a different issue. rule_expire() considers rule
> modified time for hard_timeout and the man page says hard_timeout "Causes
> the flow to expire after the given number of seconds, regardless of activity",
> however, modified time is updated in ofproto_flow_mod_learn_refresh() even if
> the rule isn't changed. Due to the current behaviour, I don't think 
> hard_timeout
> could actually ever trigger on a learned action if an idle timeout is set or 
> if
> traffic is ongoing.

The 'modified' time of a learned rule is updated in this function, because
the packet hit the learn() rule.  If the learned rule didn't exist, we
would create it at this moment.  Instead of re-creating, we just update
the 'modified' time.  If packets will stop hitting the learn() rule, but
will continue to flow through the learned one, the learned one will be
removed when the hard timeout fires up.  That is a correct behavior.

>  
> 
> Hi, Mike.  That's an interesting issue, though I'm not sure that is a
> correct way to fix it.
> 
> It is important to re-install removed learned flows during stats update.
> The logic is the following, AFAIU:
> 
> 1. Packet hits the OF rule (1) with a learning action.
> 2. Handler translates OF and creates a datapath flow for it, creating
>    a learned rule (2) in the process and filling the translation cache.
> 3. Datapath flow for the rule (1) is installed into the datapath.
> 4. Time goes by and timeout on the learned rule (2) hits. Learned rule
>    is removed now.
> 5. A packet hits the datapath flow for rule (1).
> 6. Revalidator updates flow stats for the rule (1).
> 7. Since the packet hit the learning rule (1), the rule (2) has to be
>    added back.
> 
> IIUC, the step 7 will not happen with the change proposed in this patch.
> Flow will not be re-translated, so the only way to re-install it is
> to execute side effects when we see stats update on a parent datapath
> flow in the revalidator.
> 
> One potential way to avoid breaking this scenario is to check the
> removal reason, but it's not bulletproof either as we do still need
> to reinstate the flow even if it was manually deleted or evicted.
> 
> 
> I added a log to check the removal reason in the XC_LEARN stats update, and 
> the
> reason always seems to be eviction. I wasn't able to catch an idle or hard 
> removal
> in that function, though I'm sure I just wasn't winning the race.
> 
> But in this example if we hit 7, and don't re-add the rule, what is the real 
> penalty?
> The next packet will have to upcall and we would process the learn action 
> from scratch,
> but that may be what the user wants as indicated by setting a timeout.

Not really.  If the learned flow is removed, we need a packet to hit a
learn() rule, which is a different rule, in order for traffic to continue
to flow.  Let's say we implement a firewall that allows connections only
from A to B.  So, A can initiate connection to B, but B can't initiate
connection to A.  Once A sends tcp SYN, we learn a rule that allows traffic
from B to A on these particular ports.  Now B can send traffic back.
Once the learned flow is removed, A will have to establish a new connection
in order to re-learn the B->A rule.  So, not just an upcall.  May not be a
best example, but should work as an illustration.

> The alternative
> is we re-install a flow that may be different then the flow that the user 
> actually
> wants installed.

If the flow is modified, it means that there was a traffic on a learn()
action and we have to learn that specific rule.  If it was already removed,
we have to reinstate it.  In general, the logic in the current OVS code
seems correct in this particular place.

> 
> Isn't there a risk of different revalidator threads going out of sync with 
> different
> versions of a learned rule in their xcache?
> 
> 
> It also seems like we 

Re: [ovs-dev] [PATCH] ofproto-dpif-xlate: Don't reinstall removed XC_LEARN rule

2023-07-28 Thread Mike Pattrick
On Fri, Jul 28, 2023 at 7:48 AM Ilya Maximets  wrote:

> On 7/20/23 17:06, Mike Pattrick wrote:
> > When the a revalidator thread is updating statistics for an XC_LEARN
> > xcache entry in xlate_push_stats_entry it uses ofproto_flow_mod_learn.
> > The revalidator will update stats for rules even if they are in a
> > removed state or marked as invisible. However, ofproto_flow_mod_learn
> > will detect if a flow has been removed and re-add it in that case. This
> > can result in an old learn action replacing the new learn action that
> > had replaced it in the first place.
> >
> > This change adds a new parameter to ofproto_flow_mod_learn allowing the
> > caller to specify an action to take if temp_rule is removed. If it is
> > removed and the caller is xlate_push_stats_entry, then a new rule will
> > not be replaced.
>
>
This is a slightly unrelated point, but while reviewing some code and test
results I think I found a different issue. rule_expire() considers rule
modified time for hard_timeout and the man page says hard_timeout "Causes
the flow to expire after the given number of seconds, regardless of
activity", however, modified time is updated in
ofproto_flow_mod_learn_refresh() even if the rule isn't changed. Due to the
current behaviour, I don't think hard_timeout could actually ever trigger
on a learned action if an idle timeout is set or if traffic is ongoing.


> Hi, Mike.  That's an interesting issue, though I'm not sure that is a
> correct way to fix it.
>
> It is important to re-install removed learned flows during stats update.
> The logic is the following, AFAIU:
>
> 1. Packet hits the OF rule (1) with a learning action.
> 2. Handler translates OF and creates a datapath flow for it, creating
>a learned rule (2) in the process and filling the translation cache.
> 3. Datapath flow for the rule (1) is installed into the datapath.
> 4. Time goes by and timeout on the learned rule (2) hits. Learned rule
>is removed now.
> 5. A packet hits the datapath flow for rule (1).
> 6. Revalidator updates flow stats for the rule (1).
> 7. Since the packet hit the learning rule (1), the rule (2) has to be
>added back.
>
> IIUC, the step 7 will not happen with the change proposed in this patch.
> Flow will not be re-translated, so the only way to re-install it is
> to execute side effects when we see stats update on a parent datapath
> flow in the revalidator.
>
> One potential way to avoid breaking this scenario is to check the
> removal reason, but it's not bulletproof either as we do still need
> to reinstate the flow even if it was manually deleted or evicted.
>

I added a log to check the removal reason in the XC_LEARN stats update, and
the reason always seems to be eviction. I wasn't able to catch an idle or
hard removal in that function, though I'm sure I just wasn't winning the
race.

But in this example if we hit 7, and don't re-add the rule, what is the
real penalty? The next packet will have to upcall and we would process the
learn action from scratch, but that may be what the user wants as indicated
by setting a timeout. The alternative is we re-install a flow that may be
different then the flow that the user actually wants installed.

Isn't there a risk of different revalidator threads going out of sync with
different versions of a learned rule in their xcache?


> It also seems like we may hold the reference to this deleted rule for a
> long time in the cache until the parent datapath flow is deleted.
>
>
> What is triggering the deletion of the learned rule in your case?
> Is it deletion or modification of the parent rule?  And does it have
> 'delete_learned' flag?
>

In the case that I was debugging, the parent learn action was not deleted.
Instead, new traffic from a different port was supposed to cause the learn
action to modify the existing rule. But instead we see the rule modified
with the correct information, and then immediately reverted with old stale
information.


>
> Best regards, Ilya Maximets.
>
>
Thank you,
Mike
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] ofproto-dpif-xlate: Don't reinstall removed XC_LEARN rule

2023-07-28 Thread Ilya Maximets
On 7/20/23 17:06, Mike Pattrick wrote:
> When the a revalidator thread is updating statistics for an XC_LEARN
> xcache entry in xlate_push_stats_entry it uses ofproto_flow_mod_learn.
> The revalidator will update stats for rules even if they are in a
> removed state or marked as invisible. However, ofproto_flow_mod_learn
> will detect if a flow has been removed and re-add it in that case. This
> can result in an old learn action replacing the new learn action that
> had replaced it in the first place.
> 
> This change adds a new parameter to ofproto_flow_mod_learn allowing the
> caller to specify an action to take if temp_rule is removed. If it is
> removed and the caller is xlate_push_stats_entry, then a new rule will
> not be replaced.

Hi, Mike.  That's an interesting issue, though I'm not sure that is a
correct way to fix it.

It is important to re-install removed learned flows during stats update.
The logic is the following, AFAIU:

1. Packet hits the OF rule (1) with a learning action.
2. Handler translates OF and creates a datapath flow for it, creating
   a learned rule (2) in the process and filling the translation cache.
3. Datapath flow for the rule (1) is installed into the datapath.
4. Time goes by and timeout on the learned rule (2) hits. Learned rule
   is removed now.
5. A packet hits the datapath flow for rule (1).
6. Revalidator updates flow stats for the rule (1).
7. Since the packet hit the learning rule (1), the rule (2) has to be
   added back.

IIUC, the step 7 will not happen with the change proposed in this patch.
Flow will not be re-translated, so the only way to re-install it is
to execute side effects when we see stats update on a parent datapath
flow in the revalidator.

One potential way to avoid breaking this scenario is to check the
removal reason, but it's not bulletproof either as we do still need
to reinstate the flow even if it was manually deleted or evicted.

It also seems like we may hold the reference to this deleted rule for a
long time in the cache until the parent datapath flow is deleted.


What is triggering the deletion of the learned rule in your case?
Is it deletion or modification of the parent rule?  And does it have
'delete_learned' flag?

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] ofproto-dpif-xlate: Don't reinstall removed XC_LEARN rule

2023-07-21 Thread Eelco Chaudron



On 20 Jul 2023, at 17:06, Mike Pattrick wrote:

> When the a revalidator thread is updating statistics for an XC_LEARN
> xcache entry in xlate_push_stats_entry it uses ofproto_flow_mod_learn.
> The revalidator will update stats for rules even if they are in a
> removed state or marked as invisible. However, ofproto_flow_mod_learn
> will detect if a flow has been removed and re-add it in that case. This
> can result in an old learn action replacing the new learn action that
> had replaced it in the first place.
>
> This change adds a new parameter to ofproto_flow_mod_learn allowing the
> caller to specify an action to take if temp_rule is removed. If it is
> removed and the caller is xlate_push_stats_entry, then a new rule will
> not be replaced.
>
> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2213892
> Signed-off-by: Mike Pattrick 

Hi Mike,

Thanks for this, patch! I went over the code, and the fix looks fine to me. 
However, due to being on PTO in a couple of hours, I did not fully go over the 
actual LEARN action implementation to find potential side effects. Maybe 
someone else has time to look at the potential side effects of this change.

I did run all the dp tests, with and without ASAN, and no problems were found.

With the above side note,

Acked-by: Eelco Chaudron 

Cheers,

Eelco

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev