On 09.04.2019 11:34, Ilya Maximets wrote:
> On 08.04.2019 16:44, David Marchand wrote:
>> Hello Ilya,
>>
>> On Mon, Apr 8, 2019 at 10:27 AM Ilya Maximets <i.maxim...@samsung.com 
>> <mailto:i.maxim...@samsung.com>> wrote:
>>
>>     On 04.04.2019 22:49, David Marchand wrote:
>>     > We tried to lower the number of rebalances but we don't have a
>>     > satisfying solution at the moment, so this patch rebalances on each
>>     > update.
>>
>>     Hi.
>>
>>     Triggering the reconfiguration on each vring state change is a bad thing.
>>     This could be abused by the guest to break the host networking by 
>> infinite
>>     disabling/enabling queues. Each reconfiguration leads to removing ports
>>     from the PMD port caches and their reloads. On rescheduling all the ports
>>
>>
>> I'd say the reconfiguration itself is not wanted here.
>> Only rebalancing the queues would be enough.
> 
> As you correctly mentioned in commit message where will be no real port
> configuration changes.
> 
> Under 'reconfiguration' I mean datapath reconfiguration and 'rebalancing'
> is one of its stages.
> 
>>
>>
>>     could be moved to a different PMD threads resulting in EMC/SMC/dpcls
>>     invalidation and subsequent upcalls/packet reorderings.
>>
>>
>> I agree that rebalancing does trigger EMC/SMC/dpcls invalidation when moving 
>> queues.
>> However, EMC/SMC/dpcls are per pmd specific, where would we have packet 
>> reordering ?
> 
> Rx queue could be scheduled to different PMD thread --> new packets will go 
> to different
> Tx queue. It's unlikely, but it will depend on device/driver which packets 
> will be sent
> first. The main issue here that it happens to other ports, not only to port 
> we're trying
> to reconfigure.
> 
>>
>>
>>
>>     Same issues was discussed previously while looking at possibility of
>>     vhost-pmd integration (with some test results):
>>     https://mail.openvswitch.org/pipermail/ovs-dev/2016-August/320430.html
>>
>>
>> Thanks for the link, I will test this.
>>
>>
>>
>>     One more reference:
>>     7f5f2bd0ce43 ("netdev-dpdk: Avoid reconfiguration on reconnection of 
>> same vhost device.")
>>
>>
>> Yes, I saw this patch.
>> Are we safe against guest drivers/applications that play with 
>> VIRTIO_NET_F_MQ, swapping it continuously ?
> 
> Good point. I didn't test that, but it looks like we're not safe here.
> Kernel and DPDK drivers has F_MQ enabled by default so it'll require
> some changes/explicit disabling. But I agree that this could produce
> issues if someone will do that.
> 
> We could probably handle this using 'max seen qp_num' approach. But I'm
> not a fan of this. Need to figure out how to do this correctly.
> 
> In general, I think, that we should not allow cases where guest is able
> to manipulate the host configuration.
> 
>>
>>
>>
>>
>>     Anyway, do you have some numbers of how much time PMD thread spends on 
>> polling
>>     disabled queues? What the performance improvement you're able to achieve 
>> by
>>     avoiding that?
>>
>>
>> With a simple pvp setup of mine.
>> 1c/2t poll two physical ports.
>> 1c/2t poll four vhost ports with 16 queues each.
>>   Only one queue is enabled on each virtio device attached by the guest.
>>   The first two virtio devices are bound to the virtio kmod.
>>   The last two virtio devices are bound to vfio-pci and used to forward 
>> incoming traffic with testpmd.
>>
>> The forwarding zeroloss rate goes from 5.2Mpps (polling all 64 vhost queues) 
>> to 6.2Mpps (polling only the 4 enabled vhost queues).
> 
> That's interesting. However, this doesn't look like a realistic scenario.
> In practice you'll need much more PMD threads to handle so many queues.
> If you'll add more threads, zeroloss test could show even worse results
> if one of idle VMs will periodically change the number of queues. Periodic
> latency spikes will cause queue overruns and subsequent packet drops on
> hot Rx queues. This could be partially solved by allowing n_rxq to grow only.
> However, I'd be happy to have different solution that will not hide number
> of queues from the datapath.

What do you think about something like this (not even compile tested):
---
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index bd9718824..0cf492a3b 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -591,6 +591,8 @@ struct polled_queue {
     struct dp_netdev_rxq *rxq;
     odp_port_t port_no;
     bool emc_enabled;
+    bool enabled;
+    uint64_t change_seq;
 };
 
 /* Contained by struct dp_netdev_pmd_thread's 'poll_list' member. */
@@ -5375,6 +5377,9 @@ pmd_load_queues_and_ports(struct dp_netdev_pmd_thread 
*pmd,
         poll_list[i].rxq = poll->rxq;
         poll_list[i].port_no = poll->rxq->port->port_no;
         poll_list[i].emc_enabled = poll->rxq->port->emc_enabled;
+        poll_list[i].enabled = netdev_rxq_enabled(poll->rxq);
+        poll_list[i].change_seq =
+                     netdev_get_change_seq(poll->rxq->port->netdev);
         i++;
     }
 
@@ -5440,6 +5445,10 @@ reload:
 
         for (i = 0; i < poll_cnt; i++) {
 
+            if (!poll_list[i].enabled) {
+                continue;
+            }
+
             if (poll_list[i].emc_enabled) {
                 atomic_read_relaxed(&pmd->dp->emc_insert_min,
                                     &pmd->ctx.emc_insert_min);
@@ -5476,6 +5485,15 @@ reload:
             if (reload) {
                 break;
             }
+
+            for (i = 0; i < poll_cnt; i++) {
+                uint64_t current_seq =
+                         netdev_get_change_seq(poll_list[i].rxq->port->netdev);
+                if (poll_list[i].change_seq != current_seq) {
+                    poll_list[i].change_seq = current_seq;
+                    poll_list[i].enabled = 
netdev_rxq_enabled(poll_list[i].rxq);
+                }
+            }
         }
         pmd_perf_end_iteration(s, rx_packets, tx_packets,
                                pmd_perf_metrics_enabled(pmd));
---
+ replacing of your 'netdev_request_reconfigure(&dev->up)' inside 
'vring_state_changed'
with 'netdev_change_seq_changed(&dev->up)'?

This should help to not reconfigure/reschedule everything and not waste much 
time on
polling. Also this will give ability to faster react on queue state changes.

After that we'll be able to fix reconfiguration on F_MQ manipulations with 
something
simple like that:
---
diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 91af377e8..1937561cc 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -3492,8 +3528,8 @@ new_device(int vid)
                 newnode = dev->socket_id;
             }
 
-            if (dev->requested_n_txq != qp_num
-                || dev->requested_n_rxq != qp_num
+            if (dev->requested_n_txq < qp_num
+                || dev->requested_n_rxq < qp_num
                 || dev->requested_socket_id != newnode) {
                 dev->requested_socket_id = newnode;
                 dev->requested_n_rxq = qp_num;
---
Decreasing of 'qp_num' will not cause big issues because we'll not poll
disabled queues.

And there could be one separate change to drop the requested values if socket 
connection
closed (I hope that guest is not able to control that):
---
diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 91af377e8..1937561cc 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -185,13 +185,16 @@ static const struct rte_eth_conf port_conf = {
  */
 static int new_device(int vid);
 static void destroy_device(int vid);
+static void destroy_connection(int vid);
 static int vring_state_changed(int vid, uint16_t queue_id, int enable);
 static const struct vhost_device_ops virtio_net_device_ops =
 {
     .new_device =  new_device,
     .destroy_device = destroy_device,
     .vring_state_changed = vring_state_changed,
-    .features_changed = NULL
+    .features_changed = NULL,
+    .new_connection = NULL,
+    .destroy_connection = destroy_connection
 };
 
 enum { DPDK_RING_SIZE = 256 };
@@ -3462,6 +3465,39 @@ netdev_dpdk_remap_txqs(struct netdev_dpdk *dev)
     free(enabled_queues);
 }
 
+/*
+ * vhost-user socket connection is closed.
+ */
+static void
+destroy_connection(int vid)
+{
+    struct netdev_dpdk *dev;
+    char ifname[IF_NAME_SZ];
+
+    rte_vhost_get_ifname(vid, ifname, sizeof ifname);
+
+    ovs_mutex_lock(&dpdk_mutex);
+    LIST_FOR_EACH(dev, list_node, &dpdk_list) {
+        ovs_mutex_lock(&dev->mutex);
+        if (nullable_string_is_equal(ifname, dev->vhost_id)) {
+            uint32_t qp_num = NR_QUEUE;
+
+            /* Restore the number of queue pairs to default. */
+            if (dev->requested_n_txq != qp_num
+                || dev->requested_n_rxq != qp_num) {
+                dev->requested_n_rxq = qp_num;
+                dev->requested_n_txq = qp_num;
+                netdev_request_reconfigure(&dev->up);
+            }
+            ovs_mutex_unlock(&dev->mutex);
+            break;
+        }
+        ovs_mutex_unlock(&dev->mutex);
+    }
+    ovs_mutex_unlock(&dpdk_mutex);
+}
+
+
 /*
  * A new virtio-net device is added to a vhost port.
  */
---

Best regards, Ilya Maximets.
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to