Hi Dumitru, Thanks again for confirming the behavior of the sampling rate. If the probability is applied per flow, then the current sample_collector design is indeed sufficient — we can use metadata to separate ACL sampling domains between different teams, which addresses my concern about traffic imbalance.
Regarding psample, I’d like to confirm whether there's a minimum required kernel version for it to work properly. >From the Q&A section in your OVSCON’24 presentation, I understood that kernel *6.11* might be required — but I also saw that psample was introduced upstream as early as *4.11*. My current setup is: - *OVS Library version*: 3.4.0 - *Linux Kernel*: 5.15.0-134-generic (Ubuntu) So far, I haven’t observed any psample activity. Is there a specific kernel OVS configuration or module that must be enabled for psample flows to be generated? If this is something I should explore further on my own, I completely understand — just wanted to double-check before diving deeper. Thanks again for all the support and insight you've shared. Best regards, *Oscar* On Mon, May 19, 2025 at 3:46 PM Dumitru Ceara <[email protected]> wrote: > On 5/19/25 9:01 AM, Trọng Đạt Trần wrote: > > Hi Dumitru, > > > > Hi Oscar, > > > I’d like to verify my understanding of how sampling behaves under traffic > > imbalance, specifically when multiple ACLs use the *same > sample_collector*. > > ------------------------------ > > 🔧 Simplified Scenario > > > > - > > > > *ACL_A* (Team A) is configured with: > > - > > > > sample action → metadata: 100 > > - > > > > uses sample_collector_share > > - > > > > *ACL_B* (Team B) is configured with: > > - > > > > sample action → metadata: 200 > > - > > > > uses the *same* sample_collector_share > > - > > > > sample_collector_share is configured with: > > - > > > > probability = 6553 (10%) > > > > Now assume the following: > > > > - > > > > 90 packets match *ACL_A* > > - > > > > 10 packets match *ACL_B* > > > > ------------------------------ > > ❓Question > > > > Which of the two behaviors should I expect? > > > > *(1)* A total of *10 packets randomly sampled* from the full 100 packets, > > regardless of metadata (since the sample configuration share the same > > sample_collector); > > *or* > > *(2)* A *proportional sampling* outcome: > > > > - > > > > 9 packets sampled from ACL_A (90 × 10%) > > - > > > > 1 packet sampled from ACL_B (10 × 10%) > > > > ------------------------------ > > 📖 Documentation vs. OpenFlow Action > > > > The OVN NB schema documentation under Sample_Collector suggests the > *first* > > interpretation: > > > > “Probability: Sampling probability for this collector.” > > > > However, based on your earlier explanation and the OpenFlow action: > > > > flow_sample(probability=65535, collector_set_id=2, obs_domain_id=..., > > obs_point_id=...) > > > > ... I’m inclined to believe the *second* interpretation is correct, since > > each sample action is independently applied with its own metadata > > (obs_point_id), even if they point to the same sample_collector. > > ------------------------------ > > > > Could you kindly confirm which interpretation is correct? > > ------------------------------ > > You're right, the second interpretation is correct. The probability is > associated with the OVS sample() actions the resulting openflows will > have. So, with your example, for the two ACLs we'll have two different > openflow rules with different sample actions so the probability will be > applied per flow. > > Maybe we should update the OVN documentation to make it clearer. > > > 📖 Sample Performance > > > > Thank you for pointing me to your OVSCON'24 presentation — I had missed > it > > earlier. It was very informative and gave me a much better understanding > of > > the potential performance bottlenecks in the current sampling design. > > > > I'll make sure to explore those aspects further in my upcoming tests. > > > > Ack. > > > Regarding *psample*, I’d be happy to evaluate its performance when it’s > > ready or when support becomes stable in OVN environments. It seems like a > > promising direction to offload sampling and reduce vswitchd overhead. > > > > Yes, that was the goal, reduce latency and reduce load on vswitchd. > > The psample support is stable (part of OVS release 3.4.0 - August 2024). > The only requirement is that the running kernel also supports the > datapath action. > > Regards, > Dumitru > > > Best regards, > > > > *Oscar* > > > > On Fri, May 16, 2025 at 8:03 PM Dumitru Ceara <[email protected]> wrote: > > > >> On 5/16/25 6:07 AM, Trọng Đạt Trần wrote: > >>> Dear Dumitru, > >>> > >> > >> Hi Oscar, > >> > >>> Thank you for confirming the bug — I’m happy to help however I can. > >>> ------------------------------ > >>> I. Temporary Workaround & Feedback > >>> > >>> To work around the IPFIX duplication issue in the meantime, I’ve > >>> implemented a post-processing filter that divides duplicate samples by > >> two. > >>> The logic relies on two elements: > >>> > >>> 1. > >>> > >>> *Source and destination MAC addresses* to detect reply traffic from > >> VM → > >>> router port. > >>> 2. > >>> > >>> *Sample metadata* (from the sample entry) to ensure that the match > >> comes > >>> from a to-lport ACL. > >>> > >>> This combination seems to reliably identify duplicated samples. I've > >> tested > >>> this across multiple scenarios and it works well so far. > >>> > >>> *Do you foresee any edge cases where this workaround might break down > or > >>> behave incorrectly?* > >> > >> At a first glance this seems OK to me. > >> > >>> ------------------------------ > >>> II. Questions Regarding OVN Sampling 1. *Sample Collector Table Limits* > >>> > >>> In my deployment, multiple teams share the network, but generate highly > >>> imbalanced traffic. For example: > >>> > >>> - > >>> > >>> Team A sends 90% of total traffic. > >>> - > >>> > >>> Team B sends only 10%. > >>> > >>> If I configure a shared sample_collector with probability = 6553 > (≈10%), > >>> there’s a chance Team A may generate most or all samples while Team B’s > >>> traffic may not be captured at all. > >>> > >> > >> Is traffic from Team A and Team B hitting the same ACLs? Can't the ACLs > >> be partitioned (different port groups) per team? Then you'd be able to > >> use different Sample.metadata for different teams. > >> > >>> Furthermore, the IPFIX table in the ovsdb would set cache_max_flows > >> limits > >>> causing team A and B could not be configured on the same set_id. > >>> > >>> To solve this, I configure one sample_collector per team (different > >> set_ids), > >>> so each has independent sampling: > >>> > >>> sample_collector "team_a": id=2, set_id=2 > >>> sample_collector "team_b": id=1, set_id=1 > >>> > >>> This setup works, but it introduces a potential limitation: > >>> > >>> - > >>> > >>> Since set_id is limited to 256 values, we can only support up to 256 > >>> teams (or Tenants). > >>> - > >>> > >>> In multi-tenant environments, this ceiling may be too low. > >>> > >>> Would it make sense to consider increasing this limit? > >> > >> Actually, the set_id shouldn't be limited to 8bits, it can be any 32-bit > >> value according to the schema: > >> > >> "set_id": {"type": {"key": { > >> "type": "integer", > >> "minInteger": 1, > >> "maxInteger": 4294967295}}}, > >> > >> As a side thing, now that you mention this, we only use the 8 LSB as > >> set_id in the flows we generate. I think that's a bug and we should > >> fix it. I posted a patch here: > >> > >> https://mail.openvswitch.org/pipermail/ovs-dev/2025-May/423409.html > >> > >> However, there is indeed a limit that allows at _most_ 255 unique > >> Sample_Collector NB records: > >> > >> "Sample_Collector": { > >> "columns": { > >> "id": {"type": {"key": { > >> "type": "integer", > >> "minInteger": 1, > >> "maxInteger": 255}}}, > >> > >> That's because we need to store the NB Sample_Collector ID in the > >> conntrack mark of the session we're sampling. CT mark is a 32bit > >> value and we use some bits in it for other features: > >> > >> expr_symtab_add_subfield_scoped(symtab, "ct_mark.obs_collector_id", > >> NULL, > >> "ct_mark[16..23]", WR_CT_COMMIT); > >> > >> Looking at the current code I _think_ we have 8 more bits > >> available. However, expanding the ct_mark.obs_collector_id to use > >> the whole remainder of ct_mark (64K values) seems "risky" because > >> we don't know before hand if we'll need more bits for other features > >> in the future. > >> > >> Do you have a suggestion of reasonable maximum limit for the number > >> of teams (users) in your use case? > >> > >>> 2. *Sampling Performance Considerations* > >>> > >>> Here is my current understanding — I’d appreciate confirmation or > >>> corrections: > >>> > >>> - > >>> > >>> Sampling performance is not heavily dependent on ovn-northd or > >>> ovn-controller, since the generation of the sampling flow is > >>> insignificant compared to many other features. > >>> - > >>> > >>> In ovs-vswitchd, both memory and CPU usage scale roughly linearly > with > >>> the number of active OpenFlow rules using sample(...) actions and > the > >>> rate at which those samples are triggered and exported. > >>> - > >>> > >>> Under high load, performance can be tuned using the > >> cache_active_timeout > >>> and cache_max_flows fields in the IPFIX table. These parameters > >> control > >>> export frequency and the size of the flow cache, allowing a balance > >> between > >>> monitoring fidelity and resource efficiency. > >>> > >>> Is this an accurate summary? Or are there other scaling or bottleneck > >>> factors I should consider? > >> > >> I'm not sure if you're aware but OVS (with the kernel netlink datapath > and > >> on relatively new kernels) supports a different way of sampling, > psample. > >> > >> https://github.com/openvswitch/ovs/commit/1a3bd96 > >> > >> This avoids sending packets all together to vswitchd and allows better > >> sampling performance. > >> > >> This might give more insights, a presentation from OVSCON'24 with an > end to > >> end solution for sampling network policies (ACLs) with psample in > >> ovn-kubernetes: > >> > >> https://www.youtube.com/watch?v=gLwDsaiUuN4&t=2s > >> > >>> 3. *Separate Bug Regarding ACL Tier and Sampling* > >>> > >>> I’ve also observed an issue related to sampling and ACL tier > >> interactions. > >>> Would you prefer I continue in this thread or open a new one? > >>> > >> > >> It might be better to start a new thread. Thanks again for trying this > >> new feature out! > >> > >>> Happy to follow your preferred workflow. > >>> ------------------------------ > >>> > >>> Thanks again for your time and support. > >>> > >>> Best regards, > >>> *Oscar* > >>> > >> > >> Best regards, > >> Dumitru > >> > >>> On Wed, May 14, 2025 at 5:10 PM Dumitru Ceara <[email protected]> > wrote: > >>> > >>>> Hi Oscar, > >>>> > >>>> On 5/13/25 1:04 PM, Dumitru Ceara wrote: > >>>>> On 5/13/25 11:06 AM, Trọng Đạt Trần wrote: > >>>>>> Dear Dumitru, > >>>>>> > >>>>> > >>>>> Hi Oscar, > >>>>> > >>>>>> In the previous days, I’ve performed additional tests to gain better > >>>>>> understanding around the issue before giving you the details. > >>>>>> > >>>>>> Thank you for your earlier explanation, it clarified how conntrack > and > >>>>>> sampling work in the simple "|vm1 --- ls --- vm2"| topology. > However, > >> I > >>>>>> believe my original observations still hold in router related > >>>> topologies. > >>>>>> > >>>>>> > >> ------------------------------------------------------------------------ > >>>>>> > >>>>>> > >>>>>> Setup Recap > >>>>>> > >>>>>> *Topology*: vm_a(10.2.1.5) --- ls1 --- router --- ls2 --- vm_b > >>>> (10.2.3.5) > >>>>>> > >>>>>> ACLs applied to a shared Port Group (|pg_d559...|): > >>>>>> > >>>>>> * > >>>>>> > >>>>>> *ACL A*: |from-lport| – allow-related IPv4 (sample_est = > >> |2000000|) > >>>>>> > >>>>>> * > >>>>>> > >>>>>> *ACL B*: |to-lport| – allow-related ICMP (sample_est = > |1000000|) > >>>>>> > >>>>>> *Sample configuration*: > >>>>>> > >>>>>> * ACL A: direction=from-lport, match="inport == @pg && ip4", > >>>>>> sample_est=2000000 > >>>>>> * ACL B: direction=to-lport, match="outport == @pg && ip4 && > icmp4", > >>>>>> sample_est=1000000 > >>>>>> > >>>>>> # ovn-nbctl acl-list pg_d559bf91_b95f_49c0_8e4a_bf35f15e1dcc > >>>>>> from-lport 1002 (inport == > >>>>>> @pg_d559bf91_b95f_49c0_8e4a_bf35f15e1dcc && ip4) allow-related > >>>>>> to-lport 1002 (outport == > >>>>>> @pg_d559bf91_b95f_49c0_8e4a_bf35f15e1dcc && ip4 && ip4.src == > >>>>>> 0.0.0.0/0 <http://0.0.0.0/0> && icmp4) allow-related > >>>>>> > >>>>>> | > >>>>>> > >> ------------------------------------------------------------------------ > >>>>>> > >>>>>> > >>>>>> Expected Behavior (based on your explanation) > >>>>>> > >>>>>> * > >>>>>> > >>>>>> *First ICMP request*: no sample (ct=new). > >>>>>> > >>>>>> * > >>>>>> > >>>>>> *First ICMP reply*: > >>>>>> > >>>>>> o > >>>>>> > >>>>>> One sample from *ingress pipeline* (sample_est = |1000000|) > >>>>>> > >>>>>> o > >>>>>> > >>>>>> One sample from *egress pipeline* (sample_est = |2000000|) > >>>>>> → *Total: 2 samples* for reply --> True > >>>>>> > >>>>>> > >> ------------------------------------------------------------------------ > >>>>>> > >>>>>> > >>>>>> Actual Behavior Observed > >>>>>> > >>>>>> On the *first ICMP reply*, I see: > >>>>>> > >>>>>> * > >>>>>> > >>>>>> *3 samples total*: > >>>>>> > >>>>>> o > >>>>>> > >>>>>> *2 samples* in the *ingress pipeline*, both with | > >>>>>> obs_point_id=1000000| > >>>>>> > >>>>>> o > >>>>>> > >>>>>> *1 sample* in the egress pipeline, with > |obs_point_id=2000000| > >>>>>> > >>>>>> This results in *duplicated sampling actions for a single logical > >>>>>> datapath flow* within the ingress pipeline. > >>>>>> > >>>>>> Evidence: > >>>>>> > >>>>>> # ovs-dpctl dump-flows | grep 10.2.1.5 > >>>>>> recirc_id(0x1d5),in_port(6),ct_state(-new+est-rel+rpl- > >>>>>> > >>>> > >> > inv+trk),ct_mark(0x20020/0xff0031),ct_label(0xf4240000000000000000000000000),eth(src=fa:16:3e:6b:42:8e,dst=fa:16:3e:dd:02:c0),eth_type(0x0800),ipv4(src=10.2.1.5,dst=10.2.3.5,proto=1,ttl=64,frag=no), > >>>> packets:299, bytes:29302, used:0.376s, > >>>> > >> > actions:userspace(pid=4294967295,flow_sample(probability=65535,collector_set_id=2,obs_domain_id=33554437,obs_point_id=1000000,output_port=4294967295)),userspace(pid=4294967295,flow_sample(probability=65535,collector_set_id=2,obs_domain_id=33554437,obs_point_id=1000000,output_port=4294967295)),ct_clear,set(eth(src=fa:16:3e:d5:7b:d1,dst=fa:16:3e:f8:af:7d)),set(ipv4(ttl=63)),ct(zone=21),recirc(0x1d6) > >>>>>> |# recirc_id(0x1d5): two flow_sample(...) actions with same metadata > >>>>>> (1000000) > >>>>>> recirc_id(0x1d6),in_port(6),ct_state(-new+est-rel+rpl- > >>>>>> > >>>> > >> > inv+trk),ct_mark(0x20000/0xff0031),ct_label(0x1e8480000000000000000000000000),eth(dst=fa:16:3e:f8:af:7d),eth_type(0x0800),ipv4(dst=10.2.3.5,frag=no), > >>>> packets:299, bytes:29302, used:0.376s, > >>>> > >> > actions:userspace(pid=4294967295,flow_sample(probability=65535,collector_set_id=2,obs_domain_id=33554439,obs_point_id=2000000,output_port=4294967295)),9 > >>>>>> | > >>>>>> |# plus one flow_sample(...) later in the pipeline with metadata > >>>> (2000000)| > >>>>>> > >>>>>> Also confirmed via IPFIX stats: > >>>>>> > >>>>>> # IPFIX before ping > >>>>>> |sampled pkts: 192758 # After a single ping sampled pkts: 192761 → > Δ = > >>>> 3| > >>>>>> > >>>>>> > >>>>>> Additional Findings > >>>>>> > >>>>>> * > >>>>>> > >>>>>> The issue *only occurs* when VMs are on *separate logical > switches > >>>>>> connected by a router*. > >>>>>> > >>>>>> * > >>>>>> > >>>>>> If both VMs are on the *same logical switch*, IPFIX is correctly > >>>>>> sampled only once per ACL. > >>>>>> > >>>>>> * > >>>>>> > >>>>>> The duplicated sampling occurs *even if ACL A (IPv4) and ACL C > >>>>>> (IPv6) are unrelated*, as long as both have |sample_est| and > >> belong > >>>>>> to the same Port Group. > >>>>>> > >>>>>> * > >>>>>> > >>>>>> The error can be reproduced *even when only vm_a's Port Group > has > >>>>>> the sampling ACLs*. vm_b does not require any sampling > >> configuration > >>>>>> for the issue to occur. > >>>>>> > >>>>> > >>>>> Thanks a lot for the follow up! You're right, this is indeed a bug. > >>>>> And that's because we don't clear the packet's ct_state (well all > >>>>> conntrack related information) when advancing to the egress pipeline > of > >>>>> a switch when the outport is one connected to a router. > >>>>> > >>>>> That's due to https://github.com/ovn-org/ovn/commit/d17ece7 where we > >>>>> chose to skip ct_clear if the switch has stateful (allow-related) > ACLs: > >>>>> > >>>>> "Also, this patch does not change the behavior for ACLs such as > >>>>> allow-related: packets are still sent to conntrack, even for router > >>>>> ports. While this does not work if router ports are distributed, > >>>>> allow-related ACLs work today on router ports when those ports are > >>>>> handled on the same chassis for ingress and egress traffic. This > patch > >>>>> does not change that behavior." > >>>>> > >>>>> On a second look, the above reasoning seems wrong. It doesn't sound > OK > >>>>> to rely on conntrack state retrieved from a CT zone that's not > assigned > >>>>> to the logical port we're processing the packet on. > >>>>> > >>>>> I'm going to think about the right way to fix this issue and come > back > >>>>> to this thread once it's figured out. > >>>>> > >>>> > >>>> It turns out the fix is not necessarily that straight forward. There > >>>> are a few different ways to address this though. As we (Red Hat) are > >>>> also using this feature, I opened a ticket in our internal tracking > >>>> system so that we analyze it in more depth. > >>>> > >>>> https://issues.redhat.com/browse/FDP-1408 > >>>> > >>>> However, if the OVN community in general is willing to look at fixing > >>>> this bug that would be great too. > >>>> > >>>> Regards, > >>>> Dumitru > >>>> > >>>>> Thanks again for the bug report! > >>>>> > >>>>> Regards, > >>>>> Dumitru > >>>>> > >>>>>> > >> ------------------------------------------------------------------------ > >>>>>> > >>>>>> > >>>>>> Another Reproducible Scenario (Minimal) > >>>>>> > >>>>>> Port Group A on |vm_a| with: > >>>>>> > >>>>>> * > >>>>>> > >>>>>> ACL A: |from-lport| IP4 (sample_est or not) > >>>>>> > >>>>>> * > >>>>>> > >>>>>> ACL B: |to-lport| ICMP |sample_est=1000000| > >>>>>> > >>>>>> * > >>>>>> > >>>>>> ACL C: |from-lport| IP6 sample_est=2000000 > >>>>>> > >>>>>> Port Group B on |vm_b|: > >>>>>> > >>>>>> * > >>>>>> > >>>>>> No sampling required > >>>>>> > >>>>>> * > >>>>>> > >>>>>> ACL to allow from-lport and to-lport traffic > >>>>>> > >>>>>> When pinging |vm_a| from |vm_b|, the ICMP reply still results in > *two > >>>>>> samples with |obs_point_id=1000000|*. > >>>>>> > >>>>>> > >> ------------------------------------------------------------------------ > >>>>>> > >>>>>> > >>>>>> 📌 Key Takeaway > >>>>>> > >>>>>> I believe this confirms the IPFIX duplication issue is *not due to > >>>>>> conntrack behavior*, but rather due to *how multiple ACLs with > >>>>>> sample_est on the same Port Group (in different directions) result > in > >>>>>> twice |userspace(flow_sample(...))| actions* in the same flow. > >>>>>> > >>>>>> > >> ------------------------------------------------------------------------ > >>>>>> > >>>>>> > >>>>>> To avoid overloading the email, I’ve included more detailed > >> output > >>>>>> and explanations in the attachment. > >>>>>> > >>>>>> > >>>>>> This email uses formatting elements such as icons, headers, > and > >>>>>> dividers for clarity. If you experience any display issues, > >> please > >>>>>> let me know and I’ll avoid using them in future messages. > >>>>>> > >>>>>> > >>>>>> Please tell me if I can run any additional traces. I’m happy > to > >>>>>> assist further. > >>>>>> > >>>>>> > >>>>>> Best regards, > >>>>>> > >>>>>> > >>>>>> *Oscar* > >>>>>> > >>>>>> | > >>>>>> > >>>>>> > >>>>>> On Fri, May 9, 2025 at 7:16 PM Dumitru Ceara <[email protected] > >>>>>> <mailto:[email protected]>> wrote: > >>>>>> > >>>>>> On 5/9/25 2:14 PM, Dumitru Ceara wrote: > >>>>>> > On 5/9/25 5:38 AM, Trọng Đạt Trần wrote: > >>>>>> >> Hi Dimitru, > >>>>>> >> > >>>>>> > > >>>>>> > Hi Oscar, > >>>>>> > > >>>>>> > > >>>>>> >> Thank you for pointing that out. > >>>>>> >> > >>>>>> >> To clarify: the terms “inbound” and “outbound” in my previous > >>>> message > >>>>>> >> were used from the *VM’s perspective*. > >>>>>> >> > >>>>>> >> > >>>>>> >> Topology: > >>>>>> >> > >>>>>> >> |vm_a ---- network1 ---- router ---- network2 ---- vm_b | > >>>>>> >> > >>>>>> >> > >>>>>> >> ACLs: > >>>>>> >> > >>>>>> >> * > >>>>>> >> > >>>>>> >> *ACL A*: allow-related VMs to *send* IPv4 traffic (| > >>>>>> direction=from- > >>>>>> >> lport|) > >>>>>> >> > >>>>>> >> * > >>>>>> >> > >>>>>> >> *ACL B*: allow-related VMs to *receive* ICMP traffic (| > >>>>>> direction=to- > >>>>>> >> lport|) > >>>>>> >> > >>>>>> >> I’ve attached both the *Northbound and Southbound database > >>>> dumps* to > >>>>>> >> ensure the full context is available. > >>>>>> >> > >>>>>> > > >>>>>> > Thanks for the info, I tried locally with a simplified setup > >>>> where I > >>>>>> > emulate your topology: > >>>>>> > > >>>>>> > switch c9c171ef-849c-436d-b3f9-73d83b9c4e5d (ls) > >>>>>> > port vm2 > >>>>>> > addresses: ["00:00:00:00:00:02"] > >>>>>> > port vm1 > >>>>>> > addresses: ["00:00:00:00:00:01"] > >>>>>> > > >>>>>> > Those two VIFs are in a port group: > >>>>>> > > >>>>>> > # ovn-nbctl list port_group > >>>>>> > _uuid : 7e7a96b9-e708-4eea-b380-018314f2435c > >>>>>> > acls : [1d0e7b71-ff03-4c78-ace4-2448bf237e11, > >>>>>> > 7cb023e9-fee5-4576-a67d-ce1f5d98805b] > >>>>>> > external_ids : {} > >>>>>> > name : pg > >>>>>> > ports : [d991baa6-21b0-4d46-a15d-71b9e8d6708d, > >>>>>> > f2c5679c-d891-4d34-8402-8bc2047fba61] > >>>>>> > > >>>>>> > With two ACLs applied: > >>>>>> > # ovn-nbctl acl-list pg > >>>>>> > from-lport 100 (inport==@pg && ip4) allow-related > >>>>>> > to-lport 200 (outport==@pg && ip4 && icmp4) allow-related > >>>>>> > > >>>>>> > Both ACLs have only sampling for established traffic > >> (sample_est) > >>>> set: > >>>>>> > # ovn-nbctl list acl > >>>>>> > _uuid : 1d0e7b71-ff03-4c78-ace4-2448bf237e11 > >>>>>> > action : allow-related > >>>>>> > direction : from-lport > >>>>>> > match : "inport==@pg && ip4" > >>>>>> > priority : 100 > >>>>>> > sample_est : 23153fae-0a73-4f86-bdf2-137e76647da8 > >>>>>> > sample_new : [] > >>>>>> > > >>>>>> > _uuid : 7cb023e9-fee5-4576-a67d-ce1f5d98805b > >>>>>> > action : allow-related > >>>>>> > direction : to-lport > >>>>>> > match : "outport==@pg && ip4 && icmp4" > >>>>>> > priority : 200 > >>>>>> > sample_est : 42391c82-23d2-4f2b-a7b9-88afaa68282c > >>>>>> > sample_new : [] > >>>>>> > > >>>>>> > # ovn-nbctl list sample > >>>>>> > _uuid : 23153fae-0a73-4f86-bdf2-137e76647da8 > >>>>>> > collectors : [82540855-dcd4-44e4-8354-e08a972500cd] > >>>>>> > metadata : 2000000 > >>>>>> > > >>>>>> > _uuid : 42391c82-23d2-4f2b-a7b9-88afaa68282c > >>>>>> > collectors : [82540855-dcd4-44e4-8354-e08a972500cd] > >>>>>> > metadata : 1000000 > >>>>>> > > >>>>>> > Then I send a single ICMP echo packet from vm2 towards vm1. > The > >>>> ICMP > >>>>>> > echo hits both ACLs but because it's the packet initiating the > >>>> session > >>>>>> > doesn't generate a sample (sample_new is not set in the ACLs). > >>>>>> Instead > >>>>>> > 2 conntrack entries are created for the ICMP session: > >>>>>> > > >>>>>> > - one in the CT zone of vm2 - here the from-lport ACL is hit > so > >>>> the > >>>>>> > sample_est metadata of the from-lport ACL (200000) is stored > >>>> along in > >>>>>> > the conntrack state > >>>>>> > > >>>>>> > - one in the CT zone of vm1 - here the tolport ACL is hit so > the > >>>>>> > sample_est metadata of the to-lport ACL (100000) is stored > along > >>>>>> in the > >>>>>> > conntrack state > >>>>>> > > >>>>>> > The ICMP echo packet reaches vm1 which replies with ICMP ECHO > >>>> Reply. > >>>>>> > > >>>>>> > For the reply the CT zone of vm1 is first checked, we match > the > >>>>>> existing > >>>>>> > conntrack entry (its state moves to "established") and a > sample > >>>>>> for the > >>>>>> > stored metadata, 100000, is generated. Then, in the egress > >>>> pipeline, > >>>>>> > the CT zone of vm2 is checked, we match the other existing > >>>> conntrack > >>>>>> > entry (its state also moves to "established") and a sample for > >> the > >>>>>> > stored metadata, 200000, is generated. > >>>>>> > > >>>>>> > This seems correct to me. Stats also seem to confirm that: > >>>>>> > # ip netns exec vm2 ping 42.42.42.2 -c1 > >>>>>> > PING 42.42.42.2 (42.42.42.2) 56(84) bytes of data. > >>>>>> > 64 bytes from 42.42.42.2 <http://42.42.42.2>: icmp_seq=1 > ttl=64 > >>>>>> time=1.46 ms > >>>>>> > > >>>>>> > --- 42.42.42.2 ping statistics --- > >>>>>> > 1 packets transmitted, 1 received, 0% packet loss, time 0ms > >>>>>> > rtt min/avg/max/mdev = 1.455/1.455/1.455/0.000 ms > >>>>>> > > >>>>>> > # ovs-ofctl dump-ipfix-flow br-int > >>>>>> > NXST_IPFIX_FLOW reply (xid=0x2): 1 ids > >>>>>> > id 2: flows=2, current flows=0, sampled pkts=2, ipv4 ok=2, > >>>> ipv6 > >>>>>> > ok=0, tx pkts=11 > >>>>>> > pkts errs=0, ipv4 errs=0, ipv6 errs=0, tx errs=11 > >>>>>> > > >>>>>> > But then, when I increase the number of packets things become > >> more > >>>>>> > interesting. ICMP echos also generate samples. And while > that > >>>> might > >>>>>> > seem like a bug, it's not. :) > >>>>>> > > >>>>>> > When ping sends multiple packets for a single invocation it > uses > >>>> the > >>>>>> > same ICMP ID and just increments the ICMP seq, e.g.: > >>>>>> > > >>>>>> > 14:07:41.986618 00:00:00:00:00:02 > 00:00:00:00:00:01, > ethertype > >>>> IPv4 > >>>>>> > (0x0800), length 98: (tos 0x0, ttl 64, id 58647, offset 0, > flags > >>>> [DF], > >>>>>> > proto ICMP (1), length 84) > >>>>>> > 42.42.42.3 > 42.42.42.2 <http://42.42.42.2>: ICMP echo > >>>>>> request, id 35717, seq 1, length 64 > >>>>>> > > >>>>>> > 14:07:42.988077 00:00:00:00:00:02 > 00:00:00:00:00:01, > ethertype > >>>> IPv4 > >>>>>> > (0x0800), length 98: (tos 0x0, ttl 64, id 59085, offset 0, > flags > >>>> [DF], > >>>>>> > proto ICMP (1), length 84) > >>>>>> > 42.42.42.3 > 42.42.42.2 <http://42.42.42.2>: ICMP echo > >>>>>> request, id 35717, seq 2, length 64 > >>>>>> > > >>>>>> > But conntrack doesn't use the ICMP ID in the key for the > session > >>>> it > >>>>>> > installs: > >>>>>> > >>>>>> Sorry about the typo, I meant to say "conntrack doesn't use the > >>>> ICMP SEQ > >>>>>> in the key for the session it installs, it only uses the ICMP > ID". > >>>>>> > >>>>>> > > >>>>>> > # ovs-appctl dpctl/dump-conntrack | grep 42.42.42 > >>>>>> > > >>>>>> > >>>> > >> > icmp,orig=(src=42.42.42.3,dst=42.42.42.2,id=35628,type=8,code=0),reply=(src=42.42.42.2,dst=42.42.42.3,id=35628,type=0,code=0),zone=4,mark=131104,labels=0xf4240000000000000000000000000 > >>>>>> > > >>>>>> > >>>> > >> > icmp,orig=(src=42.42.42.3,dst=42.42.42.2,id=35628,type=8,code=0),reply=(src=42.42.42.2,dst=42.42.42.3,id=35628,type=0,code=0),zone=6,mark=131072,labels=0x1e8480000000000000000000000000 > >>>>>> > > >>>>>> > So, subsequent ICMP requests will match on these two existing > >>>>>> > established entries and (because sampling_est) is configured > >>>>>> samples are > >>>>>> > generated for them too. > >>>>>> > > >>>>>> > That's also visible in the datapath flows that forward packets > >> in > >>>> the > >>>>>> > "original" direction (ICMP ECHOs in our case): > >>>>>> > > >>>>>> > # ovs-appctl dpctl/dump-flows | grep sample | grep '\-rpl' > >>>>>> > recirc_id(0x29),in_port(3),ct_state(-new+est-rel-rpl- > >>>>>> > >>>> > >> > inv+trk),ct_mark(0x20000/0xff0071),ct_label(0x1e8480000000000000000000000000),eth(src=00:00:00:00:00:02,dst=00:00:00:00:00:01),eth_type(0x0800),ipv4(proto=1,frag=no), > >>>>>> > packets:8, bytes:784, used:2.342s, > >>>>>> > > >>>>>> > >>>> > >> > actions:userspace(pid=4294967295,flow_sample(probability=65535,collector_set_id=2,obs_domain_id=33554434,obs_point_id=2000000,output_port=4294967295)),ct(commit,zone=6,mark=0x20000/0xff0071,label=0x1e8480000000000000000000000000/0xffffffffffff00000000000000000000,nat(src)),ct(zone=4),recirc(0x2a) > >>>>>> > > >>>>>> > recirc_id(0x2a),in_port(3),ct_state(-new+est-rel-rpl- > >>>>>> > >>>> > >> > inv+trk),ct_mark(0x20020/0xff0071),ct_label(0xf4240000000000000000000000000),eth(src=00:00:00:00:00:02,dst=00:00:00:00:00:00/ff:ff:00:00:00:00),eth_type(0x0800),ipv4(proto=1,frag=no), > >>>>>> > packets:8, bytes:784, used:2.342s, > >>>>>> > > >>>>>> > >>>> > >> > actions:userspace(pid=4294967295,flow_sample(probability=65535,collector_set_id=2,obs_domain_id=33554434,obs_point_id=1000000,output_port=4294967295)),ct(commit,zone=4,mark=0x20020/0xff0071,label=0xf4240000000000000000000000000/0xffffffffffff00000000000000000000,nat(src)),1 > >>>>>> > > >>>>>> > So, for a less complicated test, maybe you should try with > >> UDP/TCP > >>>>>> instead. > >>>>>> > > >>>>>> > I hope that clarifies your doubts. > >>>>>> > > >>>>>> > Best regards, > >>>>>> > Dumitru > >>>>>> > > >>>>>> >> Best regards, > >>>>>> >> > >>>>>> >> Oscar > >>>>>> >> > >>>>>> >> > >>>>>> >> On Thu, May 8, 2025 at 8:11 PM Dumitru Ceara < > >> [email protected] > >>>>>> <mailto:[email protected]> > >>>>>> >> <mailto:[email protected] <mailto:[email protected]>>> > wrote: > >>>>>> >> > >>>>>> >> Hi Oscar, > >>>>>> >> > >>>>>> >> On 5/6/25 12:31 PM, Trọng Đạt Trần wrote: > >>>>>> >> > As requested, I’ve attached additional tracing > >> information > >>>>>> related to > >>>>>> >> > the sampling duplication issue. > >>>>>> >> > > >>>>>> >> > * > >>>>>> >> > > >>>>>> >> > The file |ofproto_trace.log| contains the full > output > >>>>>> of |ofproto/ > >>>>>> >> > trace| commands. > >>>>>> >> > > >>>>>> >> > * > >>>>>> >> > > >>>>>> >> > The archive |ovn-detrace.tar.gz| includes six > >> separate > >>>>>> files, each > >>>>>> >> > corresponding to an |ovn-detrace| output for a > flow I > >>>>>> believe is > >>>>>> >> > involved in the duplicated sampling. > >>>>>> >> > > >>>>>> >> > Since I’m not fully confident in how to use |--ct-next > >>>>>> option|, I’ve > >>>>>> >> > included traces for all six related flows to ensure > >>>>>> completeness. > >>>>>> >> > > >>>>>> >> > Please let me know if you need further details, or if I > >>>>>> should re-run > >>>>>> >> > any commands with additional options. > >>>>>> >> > > >>>>>> >> > >>>>>> >> This seems fairly easy to reproduce locally for > >>>>>> investigation; I didn't > >>>>>> >> try yet though. However, would you mind sharing your OVN > >> NB > >>>>>> database > >>>>>> >> file (I'm assuming this is a test environment)? > >>>>>> >> > >>>>>> >> I would like to make sure we don't have any > >> misunderstanding > >>>>>> because the > >>>>>> >> terms you use below in your ACL description (e.g., > >>>>>> "outbound"/"inbound") > >>>>>> >> are not standard terms. Having the actual ACL (and the > >> rest > >>>>>> of the NB) > >>>>>> >> contents will make it easier to debug. > >>>>>> >> > >>>>>> >> Thanks, > >>>>>> >> Dumitru > >>>>>> >> > >>>>>> >> > Best regards, > >>>>>> >> > > >>>>>> >> > *Oscar* > >>>>>> >> > > >>>>>> >> > > >>>>>> >> > On Tue, May 6, 2025 at 4:15 PM Adrián Moreno > >>>>>> <[email protected] <mailto:[email protected]> > >>>>>> >> <mailto:[email protected] <mailto:[email protected] > >> > >>>>>> >> > <mailto:[email protected] <mailto: > [email protected]> > >>>>>> <mailto:[email protected] <mailto:[email protected]>>>> > >> wrote: > >>>>>> >> > > >>>>>> >> > On Tue, May 06, 2025 at 11:48:07AM +0700, Trọng Đạt > >>>>>> Trần wrote: > >>>>>> >> > > Dear Adrián, > >>>>>> >> > > > >>>>>> >> > > Thank you for your response. I’ve applied your > >>>>>> suggestion to use > >>>>>> >> > separate > >>>>>> >> > > sample entries for each ACL. However, I am still > >>>> seeing > >>>>>> >> unexpected > >>>>>> >> > behavior > >>>>>> >> > > in the IPFIX output that I’d like to clarify. > >>>>>> >> > > Test Setup (Same as Before) > >>>>>> >> > > > >>>>>> >> > > vm_a ---- network1 ---- router ---- network2 ---- > >>>> vm_b > >>>>>> >> > > > >>>>>> >> > > > >>>>>> >> > > - > >>>>>> >> > > > >>>>>> >> > > Two ACLs: > >>>>>> >> > > - > >>>>>> >> > > > >>>>>> >> > > ACL A: allow-related *outbound* IPv4 > >>>>>> >> > > - > >>>>>> >> > > > >>>>>> >> > > ACL B: allow-related *inbound* ICMP > >>>>>> >> > > - > >>>>>> >> > > > >>>>>> >> > > ACLs applied symmetrically to both VMs. > >>>>>> >> > > - > >>>>>> >> > > > >>>>>> >> > > Test traffic: ICMP request from vm_b to vm_a, > >> and > >>>>>> reply from > >>>>>> >> > vm_a to vm_b > >>>>>> >> > > . > >>>>>> >> > > > >>>>>> >> > > Key Problem Observed > >>>>>> >> > > > >>>>>> >> > > When sampling is enabled on *both* ACLs, the > IPFIX > >>>>>> record for > >>>>>> >> > *flow (3)* > >>>>>> >> > > (the ICMP reply from vm_a → router) shows *120 > >>>>>> packets/min*. > >>>>>> >> > > > >>>>>> >> > > However: > >>>>>> >> > > > >>>>>> >> > > - > >>>>>> >> > > > >>>>>> >> > > If *only ACL B* (inbound ICMP) is sampled → > (3) > >> = > >>>> 60 > >>>>>> >> packets/min > >>>>>> >> > > - > >>>>>> >> > > > >>>>>> >> > > If *only ACL A* (outbound IP4) is sampled → > (3) > >>>>>> not present > >>>>>> >> > > - > >>>>>> >> > > > >>>>>> >> > > If both are sampled → (3) = 120 packets/min > >>>>>> >> > > > >>>>>> >> > > This suggests that *flow (3) is being sampled > >> twice* > >>>>>> — even > >>>>>> >> though it > >>>>>> >> > > represents a *single logical flow and matches > only > >>>>>> ACL B*. > >>>>>> >> > > IPFIX Observations > >>>>>> >> > > FlowDescriptionExpectedActual > >>>>>> >> > > (1) vm_b → router (ICMP request) 60 pkt/m 60 > >>>>>> >> > > (2) router → vm_a (ICMP request) 60 pkt/m 60 > >>>>>> >> > > (3) vm_a → router (ICMP reply) 60 pkt/m 120 ⚠️ > >>>>>> >> > > (4) router → vm_b (ICMP reply) 60 pkt/m 60 > >>>>>> >> > > >>>>>> >> > This is not what I'd expect, maybe Dumitru knows? > >>>>>> >> > > >>>>>> >> > Could you attach ofproto/trace and ovn-detrce > outputs > >>>>>> from both > >>>>>> >> > directions? > >>>>>> >> > > >>>>>> >> > Thanks. > >>>>>> >> > Adrián > >>>>>> >> > > >>>>>> >> > >>>>>> > >>>> > >>>> > >>> > >> > >> > > > >
_______________________________________________ discuss mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
