On 7/11/24 00:19, Adrián Moreno wrote:
> On Wed, Jul 10, 2024 at 11:31:42PM GMT, Ilya Maximets wrote:
>> On 7/7/24 22:09, Adrian Moreno wrote:
>>> When an action set ends in a an OFP_SAMPLE action, it means the packet
>>> will be dropped and sampled. Make the drop explicit in this case so that
>>> drop statistics remain accurate.
>>>
>>> This could be done outside of the sample action, i.e: "sample(...),drop"
>>> but datapaths optimize sample actions that are found in the last
>>> position. So, taking into account that datapaths already report when the
>>> last sample probability fails, it is safe to put the drop inside the
>>> sample, i.e: "sample(...,drop)".
>>>
>>> Signed-off-by: Adrian Moreno <amore...@redhat.com>
>>> ---
>>>  ofproto/ofproto-dpif-xlate.c | 16 +++++++--
>>>  tests/drop-stats.at          | 65 ++++++++++++++++++++++++++++++++++++
>>>  tests/ofproto-dpif.at        | 44 ++++++++++++++++++++++++
>>>  3 files changed, 123 insertions(+), 2 deletions(-)
>>
>> Hi, Adrian.  As we discussed before on the kernel list, I'm not sure
>> this is a right change to make for a fwe reasons:
>>
>> 1. We do not know the user's intentions.  We do not know if they
>>    wanted to drop these packets or their goal was to sample them
>>    and it just happened to be the last action in the list, because
>>    they put it after the output action and not before.
>>
> 
> If there are datapath actions after output action you will probably get
> drops reported in both datapaths.

Not if it was a sampling action.

> 
>> 2. This patch doesn't cover cases where sampling is not actually
>>    the last action, but further actions do not yield any datapath
>>    actions.  E.g. if you have a register resoration afterwards,
>>    which happens in automatically built pipelines like in OVN.
>>    Or if the last action after sampling is learn().  And things are
>>    getting more complicated if we take into account resubmits and
>>    processing in different bridges via patch ports.
> 
> I agree this could be problematic. Maybe we should make sure the sample
> is the last dp action and "fix it". A trick such as the one done for
> sflow.

These tricks are not accurate as well.  I beleive they do not track
the information well across tunnel encaps at least.

> 
>>
>> 3. If someone is sampling the drops specifically, they know where
>>    to find the packets, because they configured the collector.
>>
> 
> I think drop stats and samples are two different things. There are
> typically extacted by different tools and systems.
> 
> Besides, what about per-bridge sampling (next patch)?
> The user enables sampling on a bridge without explicitly doing it
> for drops and suddenly the drop statistics dissapear.

I think 'user enables sampling on a bridge' counts as an explicit
altering of the datapath pipeline.

> 
>> 4. Packets are not actually dropped, they are delivered to userspace
>>    or psample.  It might make sense though to drop with a reson in
>>    case the upcall fails or psample fails to deliver to any entity
>>    and it is the last action in the datapath, but that's a kernel
>>    change to make.
>>
> 
> I don't want to get into another semantic discussion between consume,
> free and drop or the dark corners of the OpenFlow protocol. For me it's
> pretty clear that if the last action is to sample, the packet is
> "dropped" in the sense that, from a switch' perspective, if it's not
> forwarded/sent somewhere, it's dropped.
> 
> I know you don't think the same :-).
> 
> Would you then agree that the concept of dropping packets is very
> unclear and OpenFlow does not make it easy (or even possible?) to
> express a sampled drops and we should add an extension action to
> explicitly drop packets?

I agree that dropping packets is not clear in this case.
I'm not a fan of OpenFlow drop action, it may cause extra logical issues.
I'd go with a database knob instead.

I'd say we can either drop these two patches for now and find a better
solution in the next release cycle, or add an experimental global database
knob that will control this behavior and will be disabled by default.
Once we have better understanding how it behaves in a real world, we
could switch the default or remove the feature if it causes issues or
confusion.

What do you think?

> 
>> 5. Drop reporting in either datapath implementation is not 100%
>>    accurate anyway and requires users to know the kernel internals.
>>
> 
> Shouldn't we try to make them more accurate, not less?

We're not making them less accurate.

> 
>> Some of that also applies to the next patch in the set.
>>
> 
> I can take some of the critics for this patch but for me, the next one
> is bluntly fixing a bug: Per-bridge sflow/IPFIX should not break drop
> statistics.

It doesn't break the drop statistics if we see sampling as terminal
action.  It's not a drop from the datapath perspective.

> 
>> For the patch itself, you should also check that the action set is
>> actually empty (not the action list) after the sample action translation.
> 
> Aye, what about doing the sflow trick and look at the datapath flow
> alone?
> 
>> There is a chance that the clone action is disabled and sample is
>> used in place of a clone, I'm not sure this case is covered.
> 
> That case does not go through compose_sample_action but encodes the
> datapath action directly.
> 
>> And the formatting in tests seems very inconsistent.
>>
> 
> I agree, I tried to follow the drop-stats.at "style" and the result is
> pretty bad.
> 
>> Best regards, Ilya Maximets.
>>
> 
> Thanks.
> 

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to