> -----Original Message----- > From: Maxime Coquelin <maxime.coque...@redhat.com> > Sent: Wednesday, May 19, 2021 4:47 PM > To: Wang, Liang-min <liang-min.w...@intel.com>; Miskell, Timothy > <timothy.misk...@intel.com>; d...@openvswitch.org > Subject: Re: [PATCH] Extends the existing mirror configuration parameters > > > > On 5/19/21 4:17 PM, Wang, Liang-min wrote: > >> -----Original Message----- > >> From: Maxime Coquelin <maxime.coque...@redhat.com> > >> Sent: Wednesday, May 19, 2021 8:50 AM > >> To: Wang, Liang-min <liang-min.w...@intel.com>; Miskell, Timothy > >> <timothy.misk...@intel.com>; d...@openvswitch.org > >> Subject: Re: [PATCH] Extends the existing mirror configuration > parameters > >> > >> > >> > >> On 5/19/21 1:53 PM, Wang, Liang-min wrote: > >>>> -----Original Message----- > >>>> From: Maxime Coquelin <maxime.coque...@redhat.com> > >>>> Sent: Wednesday, May 19, 2021 3:56 AM > >>>> To: Wang, Liang-min <liang-min.w...@intel.com>; Miskell, Timothy > >>>> <timothy.misk...@intel.com>; d...@openvswitch.org > >>>> Subject: Re: [PATCH] Extends the existing mirror configuration > >> parameters > >>>> > >>>> Hi Liang-min, > >>>> > >>>> When replying inline, please do not prefix with ">>" as it is handled as > >>>> quoted text. There is no need to prefix. > >>>> > >>>> On 5/18/21 8:00 PM, Wang, Liang-min wrote: > >>>>>> -----Original Message----- > >>>>>> From: Maxime Coquelin <maxime.coque...@redhat.com> > >>>>>> Sent: Tuesday, May 18, 2021 12:15 PM > >>>>>> To: Miskell, Timothy <timothy.misk...@intel.com>; > >> d...@openvswitch.org > >>>>>> Cc: Wang, Liang-min <liang-min.w...@intel.com> > >>>>>> Subject: Re: [PATCH] Extends the existing mirror configuration > >>>> parameters > >>>>>> > >>>>>> Hi Timothy, Liang-min, > >>>>>> > >>>>>> Thanks for rebasing the patch. > >>>>>> A list of delta against the first RFC could help the reviewers. > >>>>>> I notice one change in the right direction is the conversion to Vhost > >>>>>> API datapath instead of Vhost PMD. > >>>>>> > >>>>>>> In this patch we support both Vhost API and Vhost PMD because > OVS > >>>> supports both > >>>>>>> VhostUser and Vdev ports. > >>>>>>> > >>>>>> Also, I would suggest to have the patch split in several incremental > >>>>>> patches to ease the review. > >>>>>> > >>>>>>> Thank you for suggestion. We will provide incremental patches on > >> next > >>>> submission > >>>>>>> > >>>>>> On 5/10/21 6:00 PM, Timothy Miskell wrote: > >>>>>>> From: Liang-min Wang <liang-min.w...@intel.com> > >>>>>>> > >>>>>>> The following parameters are added: > >>>>>>> - mirror-offload: to turn on/off mirror offloading. > >>>>>>> - output-port-name: specify a port, using name string, that is on a > >>>> different > >>>>>>> bridge > >>>>>>> - output-src-vlan: output port vlan for each select-src-port. > >>>>>>> - output-dst-vlan: output port vlan for each select-dst-port. > >>>>>>> - flow-src-mac: use src mac address of each select-dst-port for the > >>>> header > >>>>>>> scan. > >>>>>>> - flow-dst-mac: use dst mac address of each select-src-port for the > >>>> header > >>>>>>> scan. > >>>>>>> - mirror-tunnel-addr: BDF string of the tunnel device. > >>>>>>> > >>>>>>> ovs-vsctl test change because new mirroring parameters are > >> introduced > >>>> in > >>>>>> this patch > >>>>>> > >>>>>> It would help to provide examples of usage of these new > parameters. > >>>>>> > >>>>>>> Will add examples in the new patches > >>>>>>> > >>>>>>> Create a defer procedure call thread to handle all mirror offload > >>>> requests. > >>>>>>> This is a light-weight thread which remains in sleep-state when > there > >> is > >>>> no > >>>>>> new request. > >>>>>>> This is created between ovs-vsctl and mirror offloading back end > >>>>>>> > >>>>>>> Implementing DPDK tx-burst (VIRTIO ingress traffic > >>>>>>> mirror) and rx-burst (VIRTIO egress traffic mirror) callbacks. > >>>>>>> Each callback functions implement the following tasks: > >>>>>>> 1. Enable per-packet VLAN insertion > >>>>>>> - for port mirroring, all packets are enabled per-packet VLAN > >> insertion. > >>>>>>> - for flow mirroring, only packet header matches the required mac > >>>>>> address > >>>>>>> are enabled. > >>>>>>> 2. Sending the packets to the specified transport port (output-port > in > >>>>>>> mirror offload configuration) > >>>>>>> - for port mirroring, all packets are sent to the transport port. > >>>>>>> - for flow mirroring, only matched packets are sent. > >>>>>>> 3. Restore each packet attributes (remove DPDK per-packet > offload > >>>> flag) > >>>>>> > >>>>>> I will for sure have more questions later, but please find a few > >>>>>> comments/questions below: > >>>>>> > >>>>>>> Signed-off-by: Liang-min Wang <liang-min.w...@intel.com> > >>>>>>> Tested-by: Timothy Miskell <timothy.misk...@intel.com> > >>>>>>> Suggested-by: Munish Mehan <mm6...@att.com> > >>>>>>> --- > >>>>>>> lib/automake.mk | 2 + > >>>>>>> lib/netdev-dpdk-mirror.c | 516 > >>>>>> +++++++++++++++++++++++++++++++++++++ > >>>>>>> lib/netdev-dpdk-mirror.h | 83 ++++++ > >>>>>>> lib/netdev-dpdk.c | 397 ++++++++++++++++++++++++++++ > >>>>>>> lib/netdev-provider.h | 16 ++ > >>>>>>> lib/netdev.c | 386 +++++++++++++++++++++++++++ > >>>>>>> lib/netdev.h | 16 ++ > >>>>>>> tests/ovs-vsctl.at | 2 + > >>>>>>> vswitchd/bridge.c | 271 ++++++++++++++++++- > >>>>>>> vswitchd/vswitch.ovsschema | 24 +- > >>>>>>> vswitchd/vswitch.xml | 50 ++++ > >>>>>>> 11 files changed, 1759 insertions(+), 4 deletions(-) > >>>>>>> create mode 100644 lib/netdev-dpdk-mirror.c > >>>>>>> create mode 100644 lib/netdev-dpdk-mirror.h > >>>>>>> > >>>> > >>>> ... > >>>> > >>>>>>> + > >>>>>>> +/* port/flow mirror traffic processors */ > >>>>>>> +static inline uint16_t > >>>>>>> +netdev_custom_mirror_offload_cb(uint16_t qidx, struct > rte_mbuf > >>>>>> **pkts, > >>>>>>> + uint16_t nb_pkts, void *user_params) > >>>>>>> +{ > >>>>>>> + struct mirror_param *data = user_params; > >>>>>>> + uint16_t i, dst_qidx, match_count = 0; > >>>>>>> + uint16_t pkt_trans; > >>>>>>> + uint16_t dst_port_id = data->dst_port_id; > >>>>>>> + uint16_t dst_vlan_id = data->dst_vlan_id; > >>>>>>> + struct rte_mbuf **pkt_buf = &data->pkt_buf[qidx * data- > >>>>>>> max_burst_size]; > >>>>>>> + > >>>>>>> + if (nb_pkts == 0) { > >>>>>>> + return 0; > >>>>>>> + } > >>>>>>> + > >>>>>>> + if (nb_pkts > data->max_burst_size) { > >>>>>>> + VLOG_ERR("Per-flow batch size, %d, exceeds maximum > limit\n", > >>>>>> nb_pkts); > >>>>>>> + return 0; > >>>>>>> + } > >>>>>>> + > >>>>>>> + for (i = 0; i < nb_pkts; i++) { > >>>>>>> + if (data->custom_scan(pkts[i], user_params)) { > >>>>>>> + pkt_buf[match_count] = pkts[i]; > >>>>>>> + pkt_buf[match_count]->ol_flags |= PKT_TX_VLAN_PKT; > >>>>>> > >>>>>> Does it work if the packet already has a VLAN inserted? > >>>>>> > >>>>>>> Good catch. The design is based upon no VLAN insertion offloading > is > >>>> applied on source traffic. > >>>> > >>>> That is a significant limitation, I haven not seen it documented in the > >>>> series. Also, the IEEE paper mentions using QinQ when the packet has > >>>> already a VLAN inserted, but it is not in the series. Is there a reason > >>>> QinQ is not possible? > >>>> > >>>> IIUC, that means the TAP VM would see traffic having no VLAN, > whereas > >> it > >>>> has in reality? > >>>> > >>> > >>> It does support 802.1q (single VLAN). I mis-understood your last email. > >>> This design assumes there is no PKT_TX_VLAN_PKT flag set on the > source > >> mbuf. > >> > >> I understood that assumption, and it seems OVS does not use VLAN > offload > >> currently so it should be OK (even though I would suggest adding a check > >> that the flag isn't already set not to mess up with the regular traffic > >> if one day it is added). > >> > >> My question was: > >> If you packet already has a VLAN tag *already* inserted (i.e. not to be > >> inserted), won't the mirrored packet lose that information? I don't > >> expect QinQ will be setup by the VF driver as > >> > > > > With this design the mirrored traffic won't lose VLAN tag. > > For single VLAN packets, this design will add an outer VLAN tag. > > The outer VLAN can be stripped if the destination VNF (vProbe) turn on > VLAN stripping. > > The existing DPDK library does support QinQ offload. BTW, this change is > not visible in > > source VNF (no need to change source VNF setting) because VLAN > insertion > > happens at mirror tunnel device. > > OK, I would have thought that setting PKT_TX_QINQ_PKT would be needed > to > achieve that and also ensuring that the mirror port supports > DEV_TX_OFFLOAD_QINQ_INSERT. > > > > >>> This assumption is based upon that VIRTIO spec. doesn't support > >>> per-packet VLAN insertion offloading. > >> > >> I am not sure to understand this comment, but for Vhost PMD we do the > >> VLAN insertion in SW: > >> > https://elixir.bootlin.com/dpdk/latest/source/drivers/net/vhost/rte_eth_vh > >> ost.c#L447 > >> > > > >> I think this VLAN insertion in Vhost PMD is problematic twice with this > >> series: > >> 1. The offloaded VLAN insertion would be lost > >> 2. rte_vlan_insert() will fail as it needs the mbuf refcount to be 1. It > >> means the packet would be mirrored but would never reach its expected > >> destination as dropped by the Vhost PMD. > >> > > > > I see your pointes now. I agree with your assessment. We could either > > 1. support no Vhost PMD, or 2. check the offload flag at the run-time. > > For 1, how would you differentiate between a Vhost PMD port and any > other PMD? Also, I think Vhost is not the only PMD doing VLAN insert in > SW. > > I'm not sure to understand what you mean by checking the offload flag at > runtime. What would you do if the flag is set? Only solution I see would > be to do the insert in SW before doing the mirroring, but it would mean > an overhead for packets transmitted to a device that supports VLAN > offloading in HW. > Below is the new change on the "software tapping". The new change uses pre-allocated VLAN buffer to save VLAN-ID for each packet if "PKT_TX_VLAN_PK" is enabled in source traffic, and restore the overwritten meta data, vlan_tci, if it's necessary. ... memset(data->tag_buf, 0, nb_pkts * sizeof(uint16_t)); for (i = 0; i < nb_pkts; i++) { if (unlikely(pks[i]->ol_flags & PKT_TX_VLAN_PKT)) { data->tag_buf[i] = pkts[i]->vlan_tci; } pkts[i]->ol_flags |= PKT_TX_VLAN_PKT; pkts[i]->vlan_tci = dst_vlan_id; rte_mbuf_refcnt_update(pkts[i], 1); }
dst_qidx = data->queue_map[qidx]; rte_spinlock_lock(&data->locks[dst_qidx]); pkt_trans = rte_eth_tx_burst(dst_port_id, dst_qidx, pkts, nb_pkts); rte_spinlock_unlock(&data->locks[dst_qidx]); for (i = 0; i < nb_pkts; i++) { if (unlikely(data->tag_buf[i])) { pkts[i]->vlan_tci = data->tag_buf[i]; } else { pkts[i]->ol_flags &= ~PKT_TX_VLAN_PKT; pkts[i]->vlan_tci = 0; } } > > > >>> > >>>>>>> + pkt_buf[match_count]->vlan_tci = dst_vlan_id; > >>>>>>> + rte_mbuf_refcnt_update(pkt_buf[match_count], 1); > >>>>>> > >>>>>> > >>>>>> > >>>>>>> + match_count++; > >>>>>>> + } > >>>>>>> + } > >>>>>>> + > >>>>>>> + dst_qidx = (data->n_dst_queue > qidx)?qidx:(data- > >n_dst_queue > >> - > >>>> 1); > >>>>>> > >>>>>> Wouldn't it scale better with: > >>>>>> dst_qidx = qidx % data->n_dst_queue > >>>>>> ? > >>>>>> > >>>>>>> We tried to avoid using "%" operator. We could add "unlikely" and > the > >>>> suggested "%" to make improvement > >>>> > >>>> Not sure adding 'unlikely' is really necessary. The cost of the modulo > >>>> operation is nothing compared to all we do in this path. > >>>> > >>>>>>> > >>>>>>> + > >>>>>>> + rte_spinlock_lock(&data->locks[dst_qidx]); > >>>>>>> + pkt_trans = rte_eth_tx_burst(dst_port_id, dst_qidx, pkt_buf, > >>>>>> match_count); > >>>>>>> + rte_spinlock_unlock(&data->locks[dst_qidx]); > >>>>>>> + > >>>>>>> + for (i = 0; i < match_count; i++) { > >>>>>>> + pkt_buf[i]->ol_flags &= ~PKT_TX_VLAN_PKT; > >>>>>>> + } > >>>>>> > >>>>>> In order to further reduce the performance impact of mirroring, > have > >> you > >>>>>> envisaged to offload it to dedicated PMD threads? > >>>>>> > >>>>>>> The mirror-tunnel design is a comprised approach between > hardware > >>>> TAP and software TAP. > >>>>>>> The tunnel itself is designed to have very little impact on source > traffic > >>>> processing core. From > >>>>>>> our benchmark on SR-IOV, L2 forwarding, mirroring, we only > observed > >>>> 10-20% impact on 64-byte packets, and > >>>>>>> we did not observe impact when running traffic with packet size > with > >>>> 128-byte or above. > >>>> > >>>> The cover letter mentions 20-30% for 64B packets, and the IEEE paper > >>>> seems to indicate a significant impact up to 512B packets (~25%?). > >>>> > >>>> Maybe the workload is different in these tests, that could explain why > >>>> you don't see an impact starting 128B? > >>>> > >>> > >>> I believe the 20-30% drop is for VIRTIO port mirroring not SR-IOV > mirroring. > >> > >> What you mean by SRIOV mirroring is ingress traffic on one VF is > >> mirrored to another VF? Cannot it be done directly in HW? > >> > > > > The SR-IOV performance number provided here because of prior ask for > additional > > PMD core for scaling. > > OK > > >> Regarding Virtio port mirroring, we can see in the benchmark results you > >> presented at OVSCon 20 a significant impact for ingress mirroring for > >> all packet size with your VLAN offload solution. I agree this is much > >> better than with current mirroring solution, but there should be room > >> for improvement. > >> > >>> The 512B packet (~25%?) is for default VIRTIO mirroring (OVS default > >>> Implementation) not this design. > >> > >> No, this is with your mirroring solution. See the green bar in the > >> benchmark results in the paper. > >> > > > > I see the disparity here. The 25% is from the 1st IEEE paper where no mirror > tunnel > > mechanism is devised and there is no visible degradation on 512B packets > over > > our 2nd IEEE paper (expected to be published on June, > https://im2021.ieee-im.org/) > > Now; I'm curious about what has changed in the design between the two > papers that would explain that? :) Because what the first paper presents > seems quite similar to what is implemented here. > The major difference between this upstream and last upstream (RFC) is the new mirror tunnel apparatus. With mirror tunnel design, 1. The source traffic device would be using vector PMD instead of scalar PMD because of the VLAN insertion offloading. With mirror tunnel design, the VLAN insertion offloading is turned on only on mirror tunnel device. 2. The mirror tunnel is a dedicated device used only for transmitting mirrored traffic (Tx only). Therefore, we enable Tx loopback on mirror tunnel device which saves PCIe BW -- improving throughput when running high throughput traffic for benchmark. > >>>>>>> > >>>>>>> + > >>>>>>> + while (unlikely (pkt_trans < match_count)) { > >>>>>>> + rte_pktmbuf_free(pkt_buf[pkt_trans]); > >>>>>>> + pkt_trans++; > >>>>>>> + } > >>>>>>> + > >>>>>>> + return nb_pkts; > >>>>>>> +} > >>>>>>> + > >>>> > >>>> ... > >>>> > >>>>>>> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml > >>>>>>> index 4597a215d..fd2049a7f 100644 > >>>>>>> --- a/vswitchd/vswitch.xml > >>>>>>> +++ b/vswitchd/vswitch.xml > >>>>>>> @@ -4869,11 +4869,35 @@ ovs-vsctl add-port br0 p0 -- set > Interface > >> p0 > >>>>>> type=patch options:peer=p1 \ > >>>>>>> selected VLANs. > >>>>>>> </p> > >>>>>>> > >>>>>>> + <column name="mirror_tunnel_addr"> > >>>>>>> + BDF string of the tunnel device on which mirrored traffic > >>>>>>> will > be > >>>>>>> + transmitted. > >>>>>>> + </column> > >>>>>>> + > >>>>>>> <column name="select_all"> > >>>>>>> If true, every packet arriving or departing on any port is > >>>>>>> selected for mirroring. > >>>>>>> </column> > >>>>>>> > >>>>>>> + <column name="mirror_offload"> > >>>>>>> + If true, a hw-assisted port mirroring is configured instead > >>>>>>> + default mirroring. > >>>>>>> + </column> > >>>>>>> + > >>>>>>> + <column name="flow_src_mac"> > >>>>>>> + The source MAC address(es) for per-flow mirroring. Each MAC > >>>>>>> + address is separate by ','. This parametr is paired with > >>>>>>> + select_dst_port. A '0' MAC address indicates the requested > >> mirror > >>>>>>> + is a per-port mirroring, otherwise it's a per-flow mirroring > >>>>>>> + </column> > >>>>>>> + > >>>>>>> + <column name="flow_dst_mac"> > >>>>>>> + The destination MAC address(es) for per-flow mirroring. Each > >> MAC > >>>>>>> + address is separate by ','. This parametr is paired with > >>>>>>> + select_src_port. A '0' MAC address indicates the requested > >> mirror > >>>>>>> + is a per-port mirroring, otherwise it's a per-flow mirroring > >>>>>>> + </column> > >>>>>>> + > >>>>>>> <column name="select_dst_port"> > >>>>>>> Ports on which departing packets are selected for mirroring. > >>>>>>> </column> > >>>>>>> @@ -4955,6 +4979,32 @@ ovs-vsctl add-port br0 p0 -- set Interface > p0 > >>>>>> type=patch options:peer=p1 \ > >>>>>>> </p> > >>>>>>> </column> > >>>>>>> > >>>>>>> + <column name="output_src_vlan"> > >>>>>>> + <p>Output VLAN for selected source port packets, if > >>>> nonempty.</p> > >>>>>>> + <p> > >>>>>>> + <em>Please note:</em> This is different than > >>>>>>> + <ref column="output-vlan"/> This vlan is used to add an > >> additional > >>>>>>> + vlan tag on the mirror traffic, regardless it contains > >>>>>>> vlan or > not. > >>>>>>> + The receive end could choose to filter out this additional > >>>>>>> vlan. > >>>>>>> + This option is provided so the mirrored traffic could > >>>>>>> maintain > its > >>>>>>> + original vlan informaiton, and this mirror can be used to > >>>>>>> filter > >>>>>>> + out un-wanted traffic such as in <ref > >> column="mirror_offload"/>. > >>>>>>> + </p> > >>>>>>> + </column> > >>>>>>> + > >>>>>>> + <column name="output_dst_vlan"> > >>>>>>> + <p>Output VLAN for selected destination port packets, if > >>>>>> nonempty.</p> > >>>>>>> + <p> > >>>>>>> + <em>Please note:</em> This is different than > >>>>>>> + <ref column="output-vlan"/> This vlan is used to add an > >> additional > >>>>>>> + vlan tag on the mirror traffic, regardless it contains > >>>>>>> vlan or > not. > >>>>>>> + The receive end could choose to filter out this additional > >>>>>>> vlan. > >>>>>>> + This option is provided so the mirrored traffic could > >>>>>>> maintain > its > >>>>>>> + original vlan informaiton, and this mirror cab be used to > >>>>>>> filter > >>>>>>> + out un-wanted traffic such as in <ref > >> column="mirror_offload"/>. > >>>>>>> + </p> > >>>>>>> + </column> > >>>>>>> + > >>>>>>> <column name="snaplen"> > >>>>>>> <p>Maximum per-packet number of bytes to mirror.</p> > >>>>>>> <p>A mirrored packet with size larger than <ref > >>>> column="snaplen"/> > >>>>>>> > >>>>> > >>>> > >>>> These parameters are DPDK specific, but nothing mentions it. It would > >>>> confuse the OVS-Kernel users. > >>>> > >>> > >>> The current implementation only supports OVS-DPDK. > >> > >> Yes, that is my point. It is not mentioned in the documentation that it > >> is DPDK specific, and we should not expect the user to dive into the > >> code to understand it is not supported with OVS Kernel. > >> > > > > Got it. Will add this caveat in the document. Thanks. > > Thanks! > Maxime _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev