On Tue, Sep 18, 2018 at 7:23 PM Y Song <ys114...@gmail.com> wrote: > > On Tue, Sep 18, 2018 at 3:13 AM Magnus Karlsson > <magnus.karls...@intel.com> wrote: > > > > Previously, the xsk code did not record which umem was bound to a > > specific queue id. This was not required if all drivers were zero-copy > > enabled as this had to be recorded in the driver anyway. So if a user > > tried to bind two umems to the same queue, the driver would say > > no. But if copy-mode was first enabled and then zero-copy mode (or the > > reverse order), we mistakenly enabled both of them on the same umem > > leading to buggy behavior. The main culprit for this is that we did > > not store the association of umem to queue id in the copy case and > > only relied on the driver reporting this. As this relation was not > > stored in the driver for copy mode (it does not rely on the AF_XDP > > NDOs), this obviously could not work. > > > > This patch fixes the problem by always recording the umem to queue id > > relationship in the netdev_queue and netdev_rx_queue structs. This way > > we always know what kind of umem has been bound to a queue id and can > > act appropriately at bind time. > > > > Signed-off-by: Magnus Karlsson <magnus.karls...@intel.com> > > --- > > net/xdp/xdp_umem.c | 87 > > +++++++++++++++++++++++++++++++++++++++++++----------- > > net/xdp/xdp_umem.h | 2 +- > > 2 files changed, 71 insertions(+), 18 deletions(-) > > > > diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c > > index b3b632c..12300b5 100644 > > --- a/net/xdp/xdp_umem.c > > +++ b/net/xdp/xdp_umem.c > > @@ -42,6 +42,41 @@ void xdp_del_sk_umem(struct xdp_umem *umem, struct > > xdp_sock *xs) > > } > > } > > > > +/* The umem is stored both in the _rx struct and the _tx struct as we do > > + * not know if the device has more tx queues than rx, or the opposite. > > + * This might also change during run time. > > + */ > > +static void xdp_reg_umem_at_qid(struct net_device *dev, struct xdp_umem > > *umem, > > + u16 queue_id) > > +{ > > + if (queue_id < dev->real_num_rx_queues) > > + dev->_rx[queue_id].umem = umem; > > + if (queue_id < dev->real_num_tx_queues) > > + dev->_tx[queue_id].umem = umem; > > +} > > + > > +static struct xdp_umem *xdp_get_umem_from_qid(struct net_device *dev, > > + u16 queue_id) > > +{ > > + if (queue_id < dev->real_num_rx_queues) > > + return dev->_rx[queue_id].umem; > > + if (queue_id < dev->real_num_tx_queues) > > + return dev->_tx[queue_id].umem; > > + > > + return NULL; > > +} > > + > > +static void xdp_clear_umem_at_qid(struct net_device *dev, u16 queue_id) > > +{ > > + /* Zero out the entry independent on how many queues are configured > > + * at this point in time, as it might be used in the future. > > + */ > > + if (queue_id < dev->num_rx_queues) > > + dev->_rx[queue_id].umem = NULL; > > + if (queue_id < dev->num_tx_queues) > > + dev->_tx[queue_id].umem = NULL; > > +} > > + > > I am sure whether the following scenario can happen or not. > Could you clarify? > 1. suppose initially we have num_rx_queues = num_tx_queues = 10 > xdp_reg_umem_at_qid() set umem1 to queue_id = 8 > 2. num_tx_queues is changed to 5
You probably mean real_num_tx_queues here. This is the current number of queues configured via e.g. ethtool. num_tx_queues will not change unless you change device (or device driver). > 3. xdp_clear_umem_at_qid() is called for queue_id = 8, > and dev->_rx[8].umum = 0. At this point both _rx[8].umem and _tx[8].umem are set to NULL as the test is against num_[rx|tx]_queues which is the max allowed for the device, not the current allocated one which is real_num_tx_queues. With this in mind, the scenario below will not happen. Do you agree? > 4. xdp_reg_umem_at_qid() is called gain to set for queue_id = 8 > dev->_rx[8].umem = umem2 > 5. num_tx_queues is changed to 10 > Now dev->_rx[8].umem != dev->_tx[8].umem, is this possible and is it > a problem? > > > int xdp_umem_query(struct net_device *dev, u16 queue_id) > > { > > struct netdev_bpf bpf; > > @@ -58,11 +93,11 @@ int xdp_umem_query(struct net_device *dev, u16 queue_id) > > } > > > > int xdp_umem_assign_dev(struct xdp_umem *umem, struct net_device *dev, > > - u32 queue_id, u16 flags) > > + u16 queue_id, u16 flags) > > { > > bool force_zc, force_copy; > > struct netdev_bpf bpf; > > - int err; > > + int err = 0; > > > > force_zc = flags & XDP_ZEROCOPY; > > force_copy = flags & XDP_COPY; > > @@ -70,18 +105,28 @@ int xdp_umem_assign_dev(struct xdp_umem *umem, struct > > net_device *dev, > > if (force_zc && force_copy) > > return -EINVAL; > > > > + rtnl_lock(); > > + if (xdp_get_umem_from_qid(dev, queue_id)) { > > + err = -EBUSY; > > + goto rtnl_unlock; > > + } > > + > > + xdp_reg_umem_at_qid(dev, umem, queue_id); > > + umem->dev = dev; > > + umem->queue_id = queue_id; > > if (force_copy) > > - return 0; > > + /* For copy-mode, we are done. */ > > + goto rtnl_unlock; > > > > - if (!dev->netdev_ops->ndo_bpf || > > !dev->netdev_ops->ndo_xsk_async_xmit) > > - return force_zc ? -EOPNOTSUPP : 0; /* fail or fallback */ > > + if (!dev->netdev_ops->ndo_bpf || > > + !dev->netdev_ops->ndo_xsk_async_xmit) { > > + err = -EOPNOTSUPP; > > + goto err_unreg_umem; > > + } > > > > - rtnl_lock(); > > err = xdp_umem_query(dev, queue_id); > > - if (err) { > > - err = err < 0 ? -EOPNOTSUPP : -EBUSY; > > - goto err_rtnl_unlock; > > - } > > + if (err) > > + goto err_unreg_umem; > > > > bpf.command = XDP_SETUP_XSK_UMEM; > > bpf.xsk.umem = umem; > > @@ -89,18 +134,20 @@ int xdp_umem_assign_dev(struct xdp_umem *umem, struct > > net_device *dev, > > > > err = dev->netdev_ops->ndo_bpf(dev, &bpf); > > if (err) > > - goto err_rtnl_unlock; > > + goto err_unreg_umem; > > rtnl_unlock(); > > > > dev_hold(dev); > > - umem->dev = dev; > > - umem->queue_id = queue_id; > > umem->zc = true; > > return 0; > > > > -err_rtnl_unlock: > > +err_unreg_umem: > > + xdp_clear_umem_at_qid(dev, queue_id); > > You did not clear umem->dev and umem->queue_id,is a problem here? > For example in xdp_umem_clear_dev(), umem->dev is checked. As the umem might be shared, I cannot clear these variables as they are used by another socket when it is shared. The umem->dev check in xdp_umem_clear_dev() is for sockets that are killed when only half-ways set up. In that case, umem->dev might still be NULL. > > + if (!force_zc) > > + err = 0; /* fallback to copy mode */ > > +rtnl_unlock: > > rtnl_unlock(); > > - return force_zc ? err : 0; /* fail or fallback */ > > + return err; > > } > > > > static void xdp_umem_clear_dev(struct xdp_umem *umem) > > @@ -108,7 +155,7 @@ static void xdp_umem_clear_dev(struct xdp_umem *umem) > > struct netdev_bpf bpf; > > int err; > > > > - if (umem->dev) { > > + if (umem->zc) { > > bpf.command = XDP_SETUP_XSK_UMEM; > > bpf.xsk.umem = NULL; > > bpf.xsk.queue_id = umem->queue_id; > > @@ -121,7 +168,13 @@ static void xdp_umem_clear_dev(struct xdp_umem *umem) > > WARN(1, "failed to disable umem!\n"); > > > > dev_put(umem->dev); > > - umem->dev = NULL; > > + umem->zc = false; > > + } > > + > > + if (umem->dev) { > > + rtnl_lock(); > > + xdp_clear_umem_at_qid(umem->dev, umem->queue_id); > > + rtnl_unlock(); > > Previously, umem->dev is reset to NULL. Now, it is left as is. I > assume it is not possible > that this function xdp_umem_clear_dev() is called again for this umem, right? This function will only be called once for each umem / queue_id pair. We only allow one such binding with this patch. This is what was broken with the original code. Thanks Song for your review. I appreciate that you take a look at my code. /Magnus > > } > > } > > > > diff --git a/net/xdp/xdp_umem.h b/net/xdp/xdp_umem.h > > index c8be1ad..2760322 100644 > > --- a/net/xdp/xdp_umem.h > > +++ b/net/xdp/xdp_umem.h > > @@ -9,7 +9,7 @@ > > #include <net/xdp_sock.h> > > > > int xdp_umem_assign_dev(struct xdp_umem *umem, struct net_device *dev, > > - u32 queue_id, u16 flags); > > + u16 queue_id, u16 flags); > > bool xdp_umem_validate_queues(struct xdp_umem *umem); > > void xdp_get_umem(struct xdp_umem *umem); > > void xdp_put_umem(struct xdp_umem *umem); > > -- > > 2.7.4 > >