Re: [ovs-dev] [PATCH] netdev-dpdk: Fix deadlock due to virtqueue stats retrieval.

David Marchand Mon, 16 Jan 2023 05:42:19 -0800

On Fri, Jan 13, 2023 at 4:57 PM Ilya Maximets <i.maxim...@ovn.org> wrote:
> >>> @@ -5766,6 +5804,7 @@ static const struct netdev_class dpdk_vhost_class = 
> >>> {
> >>>  static const struct netdev_class dpdk_vhost_client_class = {
> >>>      .type = "dpdkvhostuserclient",
> >>>      NETDEV_DPDK_CLASS_COMMON,
> >>> +    .run = netdev_dpdk_vhost_run,
> >>
> >> This part is also needed for the 'server' type.
> >>
> >> However, there is no guarantee that run() will be called in any reasonable
> >> time, because we don't have a wait() method that would schedule a wake up
> >> for the main thread.   The main thread wakes up for other reasons like
> >> stats update, but that happens not that frequently and we can't rely on 
> >> that
> >> anyway.  For example, default stats update interval is 5 seconds.
> >>
> >> The problem is that we can't really iterate over devices and take 
> >> dev->mutex
> >> in the context of a callback.  Meaning we can't really implement a good
> >> wait() method that would trigger some seqno change that would wake up the
> >> main thread.  We could trigger a global connectivity seqno, I guess.  Not
> >> sure how safe it is, since sequence numbers use internal mutex.
> >>
> >> OTOH, maybe we can employ dpdk-watchdog thread to perform vhost 
> >> housekeeping?
> >> Currently it's doing a very similar thing for physical ports, i.e. updates
> >> the link state periodically.  It's sleep interval is a bit large though,
> >> but maybe we could tweak that, or update phy link states with the same
> >> interval as it is now, but check the vhost event queue a bit more 
> >> frequently.
> >> Maybe even have something like a backoff in dp_netdev_flow_offload_main().
> >
> > Having a dedicated thread seems safer, for the same reason as Maxime
> > mentionned for OVS  main thread: vhost events may be triggered by a
> > malicious guest and OVS would be blind to some phy link updates.
> >
> >
> > Another thought on my patch... is it safe to have an unbound queue wrt
> > to memory consumption?
> > Again, a malicious guest could make OVS consume a lot of memory (in my
> > tests, OVS complained that heap increased to 500M).


*Note* heap was at ~400M before starting the test. So it was an
increase of around 100M.

The reason was likely that .run() was not run often enough and the
queue was growing with pending updates.
Now that I went with the dedicated thread approach, all I see is an
increase of 1-2M, maximum.

I think we can avoid going with your suggestion.
WDYT?


>
> We're entering a gross over-engineering territory, but the solution
> might be following:
>
> RCU-list with nodes with following elements:
>  - ifname
>  - qid
>  - atomic_bool enabled
>  - atomic_bool seen
>  - timestamp


-- 
David Marchand

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] netdev-dpdk: Fix deadlock due to virtqueue stats retrieval.

Reply via email to