Re: [ovs-dev] [RFC] dpif-netdev: only poll enabled vhost queues

2019-04-11 Thread David Marchand
On Wed, Apr 10, 2019 at 4:24 PM Ilya Maximets 
wrote:

> On 10.04.2019 17:10, David Marchand wrote:
> > Incorporated this change, and it works fine, I get similar numbers
> with/without a bitmap, so I dropped the bitmap.
> > I would say we are only missing some information in
> dpif-netdev/pmd-rxq-show to get a per rxq status.
> >
> > I will submit this with the fix on F_MQ if this is okay for you.
>
> Sure.
> I'll make a more detailed review and, probably, some tests after that.
>
>
Sent in the series:
https://patchwork.ozlabs.org/project/openvswitch/list/?series=102214
Thanks.

-- 
David Marchand
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [RFC] dpif-netdev: only poll enabled vhost queues

2019-04-10 Thread Jan Scheurich
> >
> > I am afraid it is not a valid assumption that there will be similarly large
> number of OVS PMD threads as there are queues.
> >
> > In OpenStack deployments the OVS is typically statically configured to use a
> few dedicated host CPUs for PMDs (perhaps 2-8).
> >
> > Typical Telco VNF VMs, on the other hand, are very large (12-20 vCPUs or
> even more). If they enable an instance for multi-queue in Nova, Nova (in its
> eternal wisdom) will set up every vhostuser port with #vCPU queue pairs.
> 
> For me, it's an issue of Nova. It's pretty easy to limit the maximum number of
> queue pairs to some sane value (the value that could be handled by your
> number of available PMD threads).
> It'll be a one config and a small patch to nova-compute. With a bit more work
> you could make this per-port configurable and finally stop wasting HW
> resources.

OK, I fully agree. 
The OpenStack community is slow, though, when it comes to these kind of changes.
Do we have contacts we could push?

> 
> > A (real world) VM with 20 vCPUs and 6 ports would have 120 queue pairs,
> even if only one or two high-traffic ports can actually profit from 
> multi-queue.
> Even on those ports is it unlikely that the application will use all 16 
> queues. And
> often there would be another such VM on the second NUMA node.
> 
> With limiting the number of queues in Nova (like I described above) to 4 
> you'll
> have just
> 24 queues for 6 ports. If you'll make it per-port, you'll be able to limit 
> this to
> even more sane values.

Yes, per port configuration in Neutron seems the logical thing for me to do,
rather than a global per instance parameter in the Nova flavor. A per server
setting in Nova compute to limit the number of acceptable queue pairs to
match the OVS configuration might still be useful on top.

> 
> >
> > So, as soon as a VNF enables MQ in OpenStack, there will typically be a vast
> number of un-used queue pairs in OVS and it makes a lot of sense to minimize
> the run-time impact of having these around.
> 
> For me it seems like not an OVS, DPDK or QEMU issue. The orchestrator should
> configure sane values first of all. It's totally unclear why we're changing 
> OVS
> instead of changing Nova.

The VNF orchestrator would request queues based on the applications needs. They
should not need to be aware of the configuration of the infrastructure (such as
the number of PMD threads in OVS). The OpenStack operator would have to make
sure that the instantiated queues are a good compromise between application
needs and infra capabilities.

> 
> >
> > We have had discussion earlier with RedHat as to how a vhostuser backend
> like OVS could negotiate the number of queue pairs with Qemu down to a
> reasonable value (e.g. the number PMDs available for polling) *before* Qemu
> would actually start the guest. The guest would then not have to guess on the
> optimal number of queue pairs to actually activate.
> >
> > BR, Jan
> >
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [RFC] dpif-netdev: only poll enabled vhost queues

2019-04-10 Thread Ilya Maximets
On 10.04.2019 17:10, David Marchand wrote:
> On Wed, Apr 10, 2019 at 9:32 AM David Marchand  > wrote:
> 
> On Tue, Apr 9, 2019 at 1:42 PM Ilya Maximets  > wrote:
> 
> 
> What do you think about something like this (not even compile tested):
> ---
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index bd9718824..0cf492a3b 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -591,6 +591,8 @@ struct polled_queue {
>      struct dp_netdev_rxq *rxq;
>      odp_port_t port_no;
>      bool emc_enabled;
> +    bool enabled;
> +    uint64_t change_seq;
>  };
> 
>  /* Contained by struct dp_netdev_pmd_thread's 'poll_list' member. */
> @@ -5375,6 +5377,9 @@ pmd_load_queues_and_ports(struct 
> dp_netdev_pmd_thread *pmd,
>          poll_list[i].rxq = poll->rxq;
>          poll_list[i].port_no = poll->rxq->port->port_no;
>          poll_list[i].emc_enabled = poll->rxq->port->emc_enabled;
> +        poll_list[i].enabled = netdev_rxq_enabled(poll->rxq);
> +        poll_list[i].change_seq =
> +                     netdev_get_change_seq(poll->rxq->port->netdev);
>          i++;
>      }
> 
> @@ -5440,6 +5445,10 @@ reload:
> 
>          for (i = 0; i < poll_cnt; i++) {
> 
> +            if (!poll_list[i].enabled) {
> +                continue;
> +            }
> +
>              if (poll_list[i].emc_enabled) {
>                  atomic_read_relaxed(>dp->emc_insert_min,
>                                      >ctx.emc_insert_min);
> @@ -5476,6 +5485,15 @@ reload:
>              if (reload) {
>                  break;
>              }
> +
> +            for (i = 0; i < poll_cnt; i++) {
> +                uint64_t current_seq =
> +                         
> netdev_get_change_seq(poll_list[i].rxq->port->netdev);
> +                if (poll_list[i].change_seq != current_seq) {
> +                    poll_list[i].change_seq = current_seq;
> +                    poll_list[i].enabled = 
> netdev_rxq_enabled(poll_list[i].rxq);
> +                }
> +            }
>          }
>          pmd_perf_end_iteration(s, rx_packets, tx_packets,
>                                 pmd_perf_metrics_enabled(pmd));
> ---
> + replacing of your 'netdev_request_reconfigure(>up)' inside 
> 'vring_state_changed'
> with 'netdev_change_seq_changed(>up)'?
> 
> 
> Interesting, I would prefer to have a bitmap of rxq rather than looping 
> on all and checking the state.
> But let me try your approach first.
> 
> 
> Incorporated this change, and it works fine, I get similar numbers 
> with/without a bitmap, so I dropped the bitmap.
> I would say we are only missing some information in dpif-netdev/pmd-rxq-show 
> to get a per rxq status.
> 
> I will submit this with the fix on F_MQ if this is okay for you.

Sure.
I'll make a more detailed review and, probably, some tests after that.

> 
> Thanks!
> 
> -- 
> David Marchand
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [RFC] dpif-netdev: only poll enabled vhost queues

2019-04-10 Thread David Marchand
On Wed, Apr 10, 2019 at 9:32 AM David Marchand 
wrote:

> On Tue, Apr 9, 2019 at 1:42 PM Ilya Maximets 
> wrote:
>
>>
>> What do you think about something like this (not even compile tested):
>> ---
>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>> index bd9718824..0cf492a3b 100644
>> --- a/lib/dpif-netdev.c
>> +++ b/lib/dpif-netdev.c
>> @@ -591,6 +591,8 @@ struct polled_queue {
>>  struct dp_netdev_rxq *rxq;
>>  odp_port_t port_no;
>>  bool emc_enabled;
>> +bool enabled;
>> +uint64_t change_seq;
>>  };
>>
>>  /* Contained by struct dp_netdev_pmd_thread's 'poll_list' member. */
>> @@ -5375,6 +5377,9 @@ pmd_load_queues_and_ports(struct
>> dp_netdev_pmd_thread *pmd,
>>  poll_list[i].rxq = poll->rxq;
>>  poll_list[i].port_no = poll->rxq->port->port_no;
>>  poll_list[i].emc_enabled = poll->rxq->port->emc_enabled;
>> +poll_list[i].enabled = netdev_rxq_enabled(poll->rxq);
>> +poll_list[i].change_seq =
>> + netdev_get_change_seq(poll->rxq->port->netdev);
>>  i++;
>>  }
>>
>> @@ -5440,6 +5445,10 @@ reload:
>>
>>  for (i = 0; i < poll_cnt; i++) {
>>
>> +if (!poll_list[i].enabled) {
>> +continue;
>> +}
>> +
>>  if (poll_list[i].emc_enabled) {
>>  atomic_read_relaxed(>dp->emc_insert_min,
>>  >ctx.emc_insert_min);
>> @@ -5476,6 +5485,15 @@ reload:
>>  if (reload) {
>>  break;
>>  }
>> +
>> +for (i = 0; i < poll_cnt; i++) {
>> +uint64_t current_seq =
>> +
>>  netdev_get_change_seq(poll_list[i].rxq->port->netdev);
>> +if (poll_list[i].change_seq != current_seq) {
>> +poll_list[i].change_seq = current_seq;
>> +poll_list[i].enabled =
>> netdev_rxq_enabled(poll_list[i].rxq);
>> +}
>> +}
>>  }
>>  pmd_perf_end_iteration(s, rx_packets, tx_packets,
>> pmd_perf_metrics_enabled(pmd));
>> ---
>> + replacing of your 'netdev_request_reconfigure(>up)' inside
>> 'vring_state_changed'
>> with 'netdev_change_seq_changed(>up)'?
>>
>
> Interesting, I would prefer to have a bitmap of rxq rather than looping on
> all and checking the state.
> But let me try your approach first.
>

Incorporated this change, and it works fine, I get similar numbers
with/without a bitmap, so I dropped the bitmap.
I would say we are only missing some information in
dpif-netdev/pmd-rxq-show to get a per rxq status.

I will submit this with the fix on F_MQ if this is okay for you.

Thanks!

-- 
David Marchand
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [RFC] dpif-netdev: only poll enabled vhost queues

2019-04-10 Thread Jan Scheurich
Hi Ilya,

> >
> > With a simple pvp setup of mine.
> > 1c/2t poll two physical ports.
> > 1c/2t poll four vhost ports with 16 queues each.
> >   Only one queue is enabled on each virtio device attached by the guest.
> >   The first two virtio devices are bound to the virtio kmod.
> >   The last two virtio devices are bound to vfio-pci and used to forward
> incoming traffic with testpmd.
> >
> > The forwarding zeroloss rate goes from 5.2Mpps (polling all 64 vhost queues)
> to 6.2Mpps (polling only the 4 enabled vhost queues).
> 
> That's interesting. However, this doesn't look like a realistic scenario.
> In practice you'll need much more PMD threads to handle so many queues.
> If you'll add more threads, zeroloss test could show even worse results if 
> one of
> idle VMs will periodically change the number of queues. Periodic latency 
> spikes
> will cause queue overruns and subsequent packet drops on hot Rx queues. This
> could be partially solved by allowing n_rxq to grow only.
> However, I'd be happy to have different solution that will not hide number of
> queues from the datapath.
> 

I am afraid it is not a valid assumption that there will be similarly large 
number of OVS PMD threads as there are queues. 

In OpenStack deployments the OVS is typically statically configured to use a 
few dedicated host CPUs for PMDs (perhaps 2-8).

Typical Telco VNF VMs, on the other hand, are very large (12-20 vCPUs or even 
more). If they enable an instance for multi-queue in Nova, Nova (in its eternal 
wisdom) will set up every vhostuser port with #vCPU queue pairs. A (real world) 
VM with 20 vCPUs and 6 ports would have 120 queue pairs, even if only one or 
two high-traffic ports can actually profit from multi-queue. Even on those 
ports is it unlikely that the application will use all 16 queues. And often 
there would be another such VM on the second NUMA node.

So, as soon as a VNF enables MQ in OpenStack, there will typically be a vast 
number of un-used queue pairs in OVS and it makes a lot of sense to minimize 
the run-time impact of having these around. 

We have had discussion earlier with RedHat as to how a vhostuser backend like 
OVS could negotiate the number of queue pairs with Qemu down to a reasonable 
value (e.g. the number PMDs available for polling) *before* Qemu would actually 
start the guest. The guest would then not have to guess on the optimal number 
of queue pairs to actually activate.

BR, Jan
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [RFC] dpif-netdev: only poll enabled vhost queues

2019-04-10 Thread Ilya Maximets
On 09.04.2019 17:49, Kevin Traynor wrote:
> On 09/04/2019 12:41, Ilya Maximets wrote:
>> On 09.04.2019 11:34, Ilya Maximets wrote:
>>> On 08.04.2019 16:44, David Marchand wrote:
 Hello Ilya,

 On Mon, Apr 8, 2019 at 10:27 AM Ilya Maximets >>> > wrote:

 On 04.04.2019 22:49, David Marchand wrote:
 > We tried to lower the number of rebalances but we don't have a
 > satisfying solution at the moment, so this patch rebalances on each
 > update.

 Hi.

 Triggering the reconfiguration on each vring state change is a bad 
 thing.
 This could be abused by the guest to break the host networking by 
 infinite
 disabling/enabling queues. Each reconfiguration leads to removing ports
 from the PMD port caches and their reloads. On rescheduling all the 
 ports


 I'd say the reconfiguration itself is not wanted here.
 Only rebalancing the queues would be enough.
>>>
>>> As you correctly mentioned in commit message where will be no real port
>>> configuration changes.
>>>
>>> Under 'reconfiguration' I mean datapath reconfiguration and 'rebalancing'
>>> is one of its stages.
>>>


 could be moved to a different PMD threads resulting in EMC/SMC/dpcls
 invalidation and subsequent upcalls/packet reorderings.


 I agree that rebalancing does trigger EMC/SMC/dpcls invalidation when 
 moving queues.
 However, EMC/SMC/dpcls are per pmd specific, where would we have packet 
 reordering ?
>>>
>>> Rx queue could be scheduled to different PMD thread --> new packets will go 
>>> to different
>>> Tx queue. It's unlikely, but it will depend on device/driver which packets 
>>> will be sent
>>> first. The main issue here that it happens to other ports, not only to port 
>>> we're trying
>>> to reconfigure.
>>>



 Same issues was discussed previously while looking at possibility of
 vhost-pmd integration (with some test results):
 https://mail.openvswitch.org/pipermail/ovs-dev/2016-August/320430.html


 Thanks for the link, I will test this.



 One more reference:
 7f5f2bd0ce43 ("netdev-dpdk: Avoid reconfiguration on reconnection of 
 same vhost device.")


 Yes, I saw this patch.
 Are we safe against guest drivers/applications that play with 
 VIRTIO_NET_F_MQ, swapping it continuously ?
>>>
>>> Good point. I didn't test that, but it looks like we're not safe here.
>>> Kernel and DPDK drivers has F_MQ enabled by default so it'll require
>>> some changes/explicit disabling. But I agree that this could produce
>>> issues if someone will do that.
>>>
>>> We could probably handle this using 'max seen qp_num' approach. But I'm
>>> not a fan of this. Need to figure out how to do this correctly.
>>>
>>> In general, I think, that we should not allow cases where guest is able
>>> to manipulate the host configuration.
>>>




 Anyway, do you have some numbers of how much time PMD thread spends on 
 polling
 disabled queues? What the performance improvement you're able to 
 achieve by
 avoiding that?


 With a simple pvp setup of mine.
 1c/2t poll two physical ports.
 1c/2t poll four vhost ports with 16 queues each.
   Only one queue is enabled on each virtio device attached by the guest.
   The first two virtio devices are bound to the virtio kmod.
   The last two virtio devices are bound to vfio-pci and used to forward 
 incoming traffic with testpmd.

 The forwarding zeroloss rate goes from 5.2Mpps (polling all 64 vhost 
 queues) to 6.2Mpps (polling only the 4 enabled vhost queues).
>>>
>>> That's interesting. However, this doesn't look like a realistic scenario.
>>> In practice you'll need much more PMD threads to handle so many queues.
>>> If you'll add more threads, zeroloss test could show even worse results
>>> if one of idle VMs will periodically change the number of queues. Periodic
>>> latency spikes will cause queue overruns and subsequent packet drops on
>>> hot Rx queues. This could be partially solved by allowing n_rxq to grow 
>>> only.
>>> However, I'd be happy to have different solution that will not hide number
>>> of queues from the datapath.
>>
>> What do you think about something like this (not even compile tested):
>> ---
>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>> index bd9718824..0cf492a3b 100644
>> --- a/lib/dpif-netdev.c
>> +++ b/lib/dpif-netdev.c
>> @@ -591,6 +591,8 @@ struct polled_queue {
>>  struct dp_netdev_rxq *rxq;
>>  odp_port_t port_no;
>>  bool emc_enabled;
>> +bool enabled;
>> +uint64_t change_seq;
>>  };
>>  
>>  /* Contained by struct dp_netdev_pmd_thread's 'poll_list' member. */
>> @@ -5375,6 +5377,9 @@ pmd_load_queues_and_ports(struct dp_netdev_pmd_thread 
>> *pmd,
>>  

Re: [ovs-dev] [RFC] dpif-netdev: only poll enabled vhost queues

2019-04-10 Thread David Marchand
Hello Ilya,

On Tue, Apr 9, 2019 at 1:42 PM Ilya Maximets  wrote:

>
> What do you think about something like this (not even compile tested):
> ---
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index bd9718824..0cf492a3b 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -591,6 +591,8 @@ struct polled_queue {
>  struct dp_netdev_rxq *rxq;
>  odp_port_t port_no;
>  bool emc_enabled;
> +bool enabled;
> +uint64_t change_seq;
>  };
>
>  /* Contained by struct dp_netdev_pmd_thread's 'poll_list' member. */
> @@ -5375,6 +5377,9 @@ pmd_load_queues_and_ports(struct
> dp_netdev_pmd_thread *pmd,
>  poll_list[i].rxq = poll->rxq;
>  poll_list[i].port_no = poll->rxq->port->port_no;
>  poll_list[i].emc_enabled = poll->rxq->port->emc_enabled;
> +poll_list[i].enabled = netdev_rxq_enabled(poll->rxq);
> +poll_list[i].change_seq =
> + netdev_get_change_seq(poll->rxq->port->netdev);
>  i++;
>  }
>
> @@ -5440,6 +5445,10 @@ reload:
>
>  for (i = 0; i < poll_cnt; i++) {
>
> +if (!poll_list[i].enabled) {
> +continue;
> +}
> +
>  if (poll_list[i].emc_enabled) {
>  atomic_read_relaxed(>dp->emc_insert_min,
>  >ctx.emc_insert_min);
> @@ -5476,6 +5485,15 @@ reload:
>  if (reload) {
>  break;
>  }
> +
> +for (i = 0; i < poll_cnt; i++) {
> +uint64_t current_seq =
> +
>  netdev_get_change_seq(poll_list[i].rxq->port->netdev);
> +if (poll_list[i].change_seq != current_seq) {
> +poll_list[i].change_seq = current_seq;
> +poll_list[i].enabled =
> netdev_rxq_enabled(poll_list[i].rxq);
> +}
> +}
>  }
>  pmd_perf_end_iteration(s, rx_packets, tx_packets,
> pmd_perf_metrics_enabled(pmd));
> ---
> + replacing of your 'netdev_request_reconfigure(>up)' inside
> 'vring_state_changed'
> with 'netdev_change_seq_changed(>up)'?
>

Interesting, I would prefer to have a bitmap of rxq rather than looping on
all and checking the state.
But let me try your approach first.



> This should help to not reconfigure/reschedule everything and not waste
> much time on
> polling. Also this will give ability to faster react on queue state
> changes.
>

I suspect the additional branch could still have an impact, but I won't
know before trying.



> After that we'll be able to fix reconfiguration on F_MQ manipulations with
> something
> simple like that:
> ---
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index 91af377e8..1937561cc 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -3492,8 +3528,8 @@ new_device(int vid)
>  newnode = dev->socket_id;
>  }
>
> -if (dev->requested_n_txq != qp_num
> -|| dev->requested_n_rxq != qp_num
> +if (dev->requested_n_txq < qp_num
> +|| dev->requested_n_rxq < qp_num
>  || dev->requested_socket_id != newnode) {
>  dev->requested_socket_id = newnode;
>  dev->requested_n_rxq = qp_num;
> ---
> Decreasing of 'qp_num' will not cause big issues because we'll not poll
> disabled queues.
>

Yes.



> And there could be one separate change to drop the requested values if
> socket connection
> closed (I hope that guest is not able to control that):
>

I don't think the guest can control this part.


---
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index 91af377e8..1937561cc 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -185,13 +185,16 @@ static const struct rte_eth_conf port_conf = {
>   */
>  static int new_device(int vid);
>  static void destroy_device(int vid);
> +static void destroy_connection(int vid);
>  static int vring_state_changed(int vid, uint16_t queue_id, int enable);
>  static const struct vhost_device_ops virtio_net_device_ops =
>  {
>  .new_device =  new_device,
>  .destroy_device = destroy_device,
>  .vring_state_changed = vring_state_changed,
> -.features_changed = NULL
> +.features_changed = NULL,
> +.new_connection = NULL,
> +.destroy_connection = destroy_connection
>  };
>
>  enum { DPDK_RING_SIZE = 256 };
> @@ -3462,6 +3465,39 @@ netdev_dpdk_remap_txqs(struct netdev_dpdk *dev)
>  free(enabled_queues);
>  }
>
> +/*
> + * vhost-user socket connection is closed.
> + */
> +static void
> +destroy_connection(int vid)
> +{
> +struct netdev_dpdk *dev;
> +char ifname[IF_NAME_SZ];
> +
> +rte_vhost_get_ifname(vid, ifname, sizeof ifname);
> +
> +ovs_mutex_lock(_mutex);
> +LIST_FOR_EACH(dev, list_node, _list) {
> +ovs_mutex_lock(>mutex);
> +if (nullable_string_is_equal(ifname, dev->vhost_id)) {
> +uint32_t qp_num = NR_QUEUE;
> +
> +/* Restore the 

Re: [ovs-dev] [RFC] dpif-netdev: only poll enabled vhost queues

2019-04-09 Thread Kevin Traynor
On 09/04/2019 12:41, Ilya Maximets wrote:
> On 09.04.2019 11:34, Ilya Maximets wrote:
>> On 08.04.2019 16:44, David Marchand wrote:
>>> Hello Ilya,
>>>
>>> On Mon, Apr 8, 2019 at 10:27 AM Ilya Maximets >> > wrote:
>>>
>>> On 04.04.2019 22:49, David Marchand wrote:
>>> > We tried to lower the number of rebalances but we don't have a
>>> > satisfying solution at the moment, so this patch rebalances on each
>>> > update.
>>>
>>> Hi.
>>>
>>> Triggering the reconfiguration on each vring state change is a bad 
>>> thing.
>>> This could be abused by the guest to break the host networking by 
>>> infinite
>>> disabling/enabling queues. Each reconfiguration leads to removing ports
>>> from the PMD port caches and their reloads. On rescheduling all the 
>>> ports
>>>
>>>
>>> I'd say the reconfiguration itself is not wanted here.
>>> Only rebalancing the queues would be enough.
>>
>> As you correctly mentioned in commit message where will be no real port
>> configuration changes.
>>
>> Under 'reconfiguration' I mean datapath reconfiguration and 'rebalancing'
>> is one of its stages.
>>
>>>
>>>
>>> could be moved to a different PMD threads resulting in EMC/SMC/dpcls
>>> invalidation and subsequent upcalls/packet reorderings.
>>>
>>>
>>> I agree that rebalancing does trigger EMC/SMC/dpcls invalidation when 
>>> moving queues.
>>> However, EMC/SMC/dpcls are per pmd specific, where would we have packet 
>>> reordering ?
>>
>> Rx queue could be scheduled to different PMD thread --> new packets will go 
>> to different
>> Tx queue. It's unlikely, but it will depend on device/driver which packets 
>> will be sent
>> first. The main issue here that it happens to other ports, not only to port 
>> we're trying
>> to reconfigure.
>>
>>>
>>>
>>>
>>> Same issues was discussed previously while looking at possibility of
>>> vhost-pmd integration (with some test results):
>>> https://mail.openvswitch.org/pipermail/ovs-dev/2016-August/320430.html
>>>
>>>
>>> Thanks for the link, I will test this.
>>>
>>>
>>>
>>> One more reference:
>>> 7f5f2bd0ce43 ("netdev-dpdk: Avoid reconfiguration on reconnection of 
>>> same vhost device.")
>>>
>>>
>>> Yes, I saw this patch.
>>> Are we safe against guest drivers/applications that play with 
>>> VIRTIO_NET_F_MQ, swapping it continuously ?
>>
>> Good point. I didn't test that, but it looks like we're not safe here.
>> Kernel and DPDK drivers has F_MQ enabled by default so it'll require
>> some changes/explicit disabling. But I agree that this could produce
>> issues if someone will do that.
>>
>> We could probably handle this using 'max seen qp_num' approach. But I'm
>> not a fan of this. Need to figure out how to do this correctly.
>>
>> In general, I think, that we should not allow cases where guest is able
>> to manipulate the host configuration.
>>
>>>
>>>
>>>
>>>
>>> Anyway, do you have some numbers of how much time PMD thread spends on 
>>> polling
>>> disabled queues? What the performance improvement you're able to 
>>> achieve by
>>> avoiding that?
>>>
>>>
>>> With a simple pvp setup of mine.
>>> 1c/2t poll two physical ports.
>>> 1c/2t poll four vhost ports with 16 queues each.
>>>   Only one queue is enabled on each virtio device attached by the guest.
>>>   The first two virtio devices are bound to the virtio kmod.
>>>   The last two virtio devices are bound to vfio-pci and used to forward 
>>> incoming traffic with testpmd.
>>>
>>> The forwarding zeroloss rate goes from 5.2Mpps (polling all 64 vhost 
>>> queues) to 6.2Mpps (polling only the 4 enabled vhost queues).
>>
>> That's interesting. However, this doesn't look like a realistic scenario.
>> In practice you'll need much more PMD threads to handle so many queues.
>> If you'll add more threads, zeroloss test could show even worse results
>> if one of idle VMs will periodically change the number of queues. Periodic
>> latency spikes will cause queue overruns and subsequent packet drops on
>> hot Rx queues. This could be partially solved by allowing n_rxq to grow only.
>> However, I'd be happy to have different solution that will not hide number
>> of queues from the datapath.
> 
> What do you think about something like this (not even compile tested):
> ---
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index bd9718824..0cf492a3b 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -591,6 +591,8 @@ struct polled_queue {
>  struct dp_netdev_rxq *rxq;
>  odp_port_t port_no;
>  bool emc_enabled;
> +bool enabled;
> +uint64_t change_seq;
>  };
>  
>  /* Contained by struct dp_netdev_pmd_thread's 'poll_list' member. */
> @@ -5375,6 +5377,9 @@ pmd_load_queues_and_ports(struct dp_netdev_pmd_thread 
> *pmd,
>  poll_list[i].rxq = poll->rxq;
>  poll_list[i].port_no = poll->rxq->port->port_no;
>  poll_list[i].emc_enabled = poll->rxq->port->emc_enabled;
> +

Re: [ovs-dev] [RFC] dpif-netdev: only poll enabled vhost queues

2019-04-09 Thread Ilya Maximets
On 08.04.2019 16:44, David Marchand wrote:
> Hello Ilya,
> 
> On Mon, Apr 8, 2019 at 10:27 AM Ilya Maximets  > wrote:
> 
> On 04.04.2019 22:49, David Marchand wrote:
> > We tried to lower the number of rebalances but we don't have a
> > satisfying solution at the moment, so this patch rebalances on each
> > update.
> 
> Hi.
> 
> Triggering the reconfiguration on each vring state change is a bad thing.
> This could be abused by the guest to break the host networking by infinite
> disabling/enabling queues. Each reconfiguration leads to removing ports
> from the PMD port caches and their reloads. On rescheduling all the ports
> 
> 
> I'd say the reconfiguration itself is not wanted here.
> Only rebalancing the queues would be enough.

As you correctly mentioned in commit message where will be no real port
configuration changes.

Under 'reconfiguration' I mean datapath reconfiguration and 'rebalancing'
is one of its stages.

> 
> 
> could be moved to a different PMD threads resulting in EMC/SMC/dpcls
> invalidation and subsequent upcalls/packet reorderings.
> 
> 
> I agree that rebalancing does trigger EMC/SMC/dpcls invalidation when moving 
> queues.
> However, EMC/SMC/dpcls are per pmd specific, where would we have packet 
> reordering ?

Rx queue could be scheduled to different PMD thread --> new packets will go to 
different
Tx queue. It's unlikely, but it will depend on device/driver which packets will 
be sent
first. The main issue here that it happens to other ports, not only to port 
we're trying
to reconfigure.

> 
> 
> 
> Same issues was discussed previously while looking at possibility of
> vhost-pmd integration (with some test results):
> https://mail.openvswitch.org/pipermail/ovs-dev/2016-August/320430.html
> 
> 
> Thanks for the link, I will test this.
> 
> 
> 
> One more reference:
> 7f5f2bd0ce43 ("netdev-dpdk: Avoid reconfiguration on reconnection of same 
> vhost device.")
> 
> 
> Yes, I saw this patch.
> Are we safe against guest drivers/applications that play with 
> VIRTIO_NET_F_MQ, swapping it continuously ?

Good point. I didn't test that, but it looks like we're not safe here.
Kernel and DPDK drivers has F_MQ enabled by default so it'll require
some changes/explicit disabling. But I agree that this could produce
issues if someone will do that.

We could probably handle this using 'max seen qp_num' approach. But I'm
not a fan of this. Need to figure out how to do this correctly.

In general, I think, that we should not allow cases where guest is able
to manipulate the host configuration.

> 
> 
> 
> 
> Anyway, do you have some numbers of how much time PMD thread spends on 
> polling
> disabled queues? What the performance improvement you're able to achieve 
> by
> avoiding that?
> 
> 
> With a simple pvp setup of mine.
> 1c/2t poll two physical ports.
> 1c/2t poll four vhost ports with 16 queues each.
>   Only one queue is enabled on each virtio device attached by the guest.
>   The first two virtio devices are bound to the virtio kmod.
>   The last two virtio devices are bound to vfio-pci and used to forward 
> incoming traffic with testpmd.
> 
> The forwarding zeroloss rate goes from 5.2Mpps (polling all 64 vhost queues) 
> to 6.2Mpps (polling only the 4 enabled vhost queues).

That's interesting. However, this doesn't look like a realistic scenario.
In practice you'll need much more PMD threads to handle so many queues.
If you'll add more threads, zeroloss test could show even worse results
if one of idle VMs will periodically change the number of queues. Periodic
latency spikes will cause queue overruns and subsequent packet drops on
hot Rx queues. This could be partially solved by allowing n_rxq to grow only.
However, I'd be happy to have different solution that will not hide number
of queues from the datapath.

> -- 
> David Marchand
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [RFC] dpif-netdev: only poll enabled vhost queues

2019-04-08 Thread David Marchand
Hello Ilya,

On Mon, Apr 8, 2019 at 10:27 AM Ilya Maximets 
wrote:

> On 04.04.2019 22:49, David Marchand wrote:
> > We tried to lower the number of rebalances but we don't have a
> > satisfying solution at the moment, so this patch rebalances on each
> > update.
>
> Hi.
>
> Triggering the reconfiguration on each vring state change is a bad thing.
> This could be abused by the guest to break the host networking by infinite
> disabling/enabling queues. Each reconfiguration leads to removing ports
> from the PMD port caches and their reloads. On rescheduling all the ports
>

I'd say the reconfiguration itself is not wanted here.
Only rebalancing the queues would be enough.


could be moved to a different PMD threads resulting in EMC/SMC/dpcls
> invalidation and subsequent upcalls/packet reorderings.
>

I agree that rebalancing does trigger EMC/SMC/dpcls invalidation when
moving queues.
However, EMC/SMC/dpcls are per pmd specific, where would we have packet
reordering ?



> Same issues was discussed previously while looking at possibility of
> vhost-pmd integration (with some test results):
> https://mail.openvswitch.org/pipermail/ovs-dev/2016-August/320430.html


Thanks for the link, I will test this.



> One more reference:
> 7f5f2bd0ce43 ("netdev-dpdk: Avoid reconfiguration on reconnection of same
> vhost device.")
>

Yes, I saw this patch.
Are we safe against guest drivers/applications that play with
VIRTIO_NET_F_MQ, swapping it continuously ?




> Anyway, do you have some numbers of how much time PMD thread spends on
> polling
> disabled queues? What the performance improvement you're able to achieve by
> avoiding that?
>

With a simple pvp setup of mine.
1c/2t poll two physical ports.
1c/2t poll four vhost ports with 16 queues each.
  Only one queue is enabled on each virtio device attached by the guest.
  The first two virtio devices are bound to the virtio kmod.
  The last two virtio devices are bound to vfio-pci and used to forward
incoming traffic with testpmd.

The forwarding zeroloss rate goes from 5.2Mpps (polling all 64 vhost
queues) to 6.2Mpps (polling only the 4 enabled vhost queues).



-- 
David Marchand
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [RFC] dpif-netdev: only poll enabled vhost queues

2019-04-08 Thread Ilya Maximets
On 04.04.2019 22:49, David Marchand wrote:
> We currently poll all available queues based on the max queue count
> exchanged with the vhost peer and rely on the vhost library in DPDK to
> check the vring status beneath.
> This can lead to some overhead when we have a lot of unused queues.
> This situation happens easily when you provision more queues than you
> need for your virtual machines.
> 
> To enhance the situation, we can inform the rxq scheduling algorithm to
> skip unused queues. All we need is to catch the per rxq notifications
> and trigger a rebalance by asking for a port reconfigure (without
> changing the port configuration itself).
> 
> A consequence of this is that before a device is connected to the vhost
> port, no rxq is polled at all.
> 
> Signed-off-by: David Marchand 
> ---
> 
> We tried to lower the number of rebalances but we don't have a
> satisfying solution at the moment, so this patch rebalances on each
> update.

Hi.

Triggering the reconfiguration on each vring state change is a bad thing.
This could be abused by the guest to break the host networking by infinite
disabling/enabling queues. Each reconfiguration leads to removing ports
from the PMD port caches and their reloads. On rescheduling all the ports
could be moved to a different PMD threads resulting in EMC/SMC/dpcls
invalidation and subsequent upcalls/packet reorderings.

Same issues was discussed previously while looking at possibility of
vhost-pmd integration (with some test results):
https://mail.openvswitch.org/pipermail/ovs-dev/2016-August/320430.html
One more reference:
7f5f2bd0ce43 ("netdev-dpdk: Avoid reconfiguration on reconnection of same vhost 
device.")

One possible way to improve your case where guest never uses all the
queues allocated in QEMU is to requesting the reconfiguration only
if we're receiving "vring enabled" event for the queue that has higher
number than ever seen on this device. i.e.:

if (is_rx && enable && qid >= dev->requested_n_rxq) {
dev->requested_n_rxq = qid + 1;
netdev_request_reconfigure(>up);
}

At least this will limit number of reconfigurations to the number of queues.
But I'm still not completely sure if this change is good enough. (I believe
that we discussed this solution somewhere on a list, but can't find right now).

Anyway, do you have some numbers of how much time PMD thread spends on polling
disabled queues? What the performance improvement you're able to achieve by
avoiding that?

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev