On Sun, Aug 02, 2020 at 10:29:07PM +0200, Olivier Matz wrote:
> Hi,
>
> On Sun, Aug 02, 2020 at 07:12:36AM +0800, yang_y_yi wrote:
> >
> >
> > At 2020-07-31 23:15:43, "Olivier Matz" <[email protected]> wrote:
> > >Hi,
> > >
> > >On Thu, Jul 30, 2020 at 08:08:59PM +0800, [email protected] wrote:
> > >> From: Yi Yang <[email protected]>
> > >>
> > >> In GSO case, segmented mbufs are attached to original
> > >> mbuf which can't be freed when it is external. The issue
> > >> is free_cb doesn't know original mbuf and doesn't free
> > >> it when refcnt of shinfo is 0.
> > >>
> > >> Original mbuf can be freed by rte_pktmbuf_free segmented
> > >> mbufs or by rte_pktmbuf_free original mbuf. Two kind of
> > >> cases should have different behaviors. free_cb won't
> > >> explicitly call rte_pktmbuf_free to free original mbuf
> > >> if it is freed by rte_pktmbuf_free original mbuf, but it
> > >> has to call rte_pktmbuf_free to free original mbuf if it
> > >> is freed by rte_pktmbuf_free segmented mbufs.
> > >>
> > >> In order to fix this issue, free_cb interface has to been
> > >> changed, __rte_pktmbuf_free_extbuf must deliver called
> > >> mbuf pointer to free_cb, argument opaque can be defined
> > >> as a custom struct by user, it can includes original mbuf
> > >> pointer, user-defined free_cb can compare caller mbuf with
> > >> mbuf in opaque struct, free_cb should free original mbuf
> > >> if they are not same, this corresponds to rte_pktmbuf_free
> > >> segmented mbufs case, otherwise, free_cb won't free original
> > >> mbuf because the caller explicitly called rte_pktmbuf_free
> > >> to free it.
> > >>
> > >> Here is pseduo code to show two kind of cases.
> > >>
> > >> case 1. rte_pktmbuf_free segmented mbufs
> > >>
> > >> nb_tx = rte_gso_segment(original_mbuf, /* original mbuf */
> > >> &gso_ctx,
> > >> /* segmented mbuf */
> > >> (struct rte_mbuf **)&gso_mbufs,
> > >> MAX_GSO_MBUFS);
> > >
> > >I'm sorry but it is not very clear to me what operations are done by
> > >rte_gso_segment().
> > >
> > >In the current code, I only see calls to rte_pktmbuf_attach(),
> > >which do not deal with external buffers. Am I missing something?
> > >
> > >Are you able to show the issue only with mbuf functions? It would
> > >be helpful to understand what does not work.
> > >
> > >
> > >Thanks,
> > >Olivier
> > >
> > Oliver, thank you for comment, let me show you why it doesn't work for my
> > use case. In OVS DPDK, VM uses vhostuserclient to send large packets whose
> > size is about 64K because we enabled TSO & UFO, these large packets use
> > rte_mbufs allocated by DPDK virtio_net function
> > virtio_dev_pktmbuf_alloc() (in file lib/librte_vhost/virtio_net.c. Please
> > refer to [PATCH V1 3/3], I changed free_cb as below, these packets use the
> > same allocate function and the same free_cb no matter they are TCP packet
> > or UDP packets, in case of VXLAN TSO, most NICs can't support inner UDP
> > fragment offload, so OVS DPDK has to do it by software, for UDP case, the
> > original rte_mbuf only can be freed by segmented rte_mbufs which are output
> > packets of rte_gso_segment, i.e. the original rte_mbuf only can freed by
> > free_cb, you can see, it explicitly called rte_pktmbuf_free(arg->mbuf), the
> > condition statement "if (caller_m != arg->mbuf)" is true for this case,
> > this has no problem, but for TCP case, the original mbuf is delivered to
> > rte_eth_tx_burst() but not segmented rte_mbufs output by rte_gso_segment,
> > PMD driver will call rte_pktmbuf_free(original_rte_mbuf) but not
> > rte_pktmbuf_free(segmented_rte_mbufs), the same free_cb will be called,
> > that means original_rte_mbuf will be freed twice, you know what will
> > happen, this is just the issue I'm fixing. I bring in caller_m argument, it
> > can help work around this because caller_m is arg->mbuf and the condition
> > statement "if (caller_m != arg->mbuf)" is false, you can't fix it without
> > the change this patch series did.
>
> I'm sill not sure to get your issue. Please, if you have a simple test
> case using only mbufs functions (without virtio, gso, ...), it would be
> very helpful because we will be sure that we are talking about the same
> thing. In case there is an issue, it can easily become a unit test.
>
> That said, I looked at vhost mbuf allocation and gso segmentation, and
> I found some strange things:
>
> 1/ In virtio_dev_extbuf_alloc(), and I there are 2 paths to create the
> ext mbuf.
>
> a/ The first one stores the shinfo struct in the mbuf, basically
> like this:
>
> pkt = rte_pktmbuf_alloc(mp);
> shinfo = rte_pktmbuf_mtod(pkt, struct rte_mbuf_ext_shared_info *);
> buf = rte_malloc(NULL, buf_len, RTE_CACHE_LINE_SIZE);
> shinfo->free_cb = virtio_dev_extbuf_free;
> shinfo->fcb_opaque = buf;
> rte_mbuf_ext_refcnt_set(shinfo, 1);
>
> I don't think it is a good idea, because there is no guarantee that
> the mbuf won't be freed before the buffer. For instance, doing
> this will probably fail:
>
> pkt2 = rte_pktmbuf_alloc(mp);
> rte_pktmbuf_attach(pkt2, pkt);
> rte_pktmbuf_free(pkt); /* pkt is freed, but it contains shinfo ! */
>
> To do this properly, the mbuf refcnt should be increased, and
> the mbuf should be freed in the callback. But I don't think it's
> worth doing it, given the second path (b/) looks good to me.
>
> b/ The second path stores the shinfo struct at the end of the
> allocated buffer, like this:
>
> pkt = rte_pktmbuf_alloc(mp);
> buf_len += sizeof(*shinfo) + sizeof(uintptr_t);
> buf_len = RTE_ALIGN_CEIL(total_len, sizeof(uintptr_t));
> buf = rte_malloc(NULL, buf_len, RTE_CACHE_LINE_SIZE);
> shinfo = rte_pktmbuf_ext_shinfo_init_helper(buf, &buf_len,
> virtio_dev_extbuf_free, buf);
>
> I think this is correct, because we have the guarantee that shinfo
> exists as long as the buffer exists.
>
> 2/ in rte_gso_segment(), there is a loop like this:
>
> while (pkt_seg) {
> rte_mbuf_refcnt_update(pkt_seg, -1);
> pkt_seg = pkt_seg->next;
> }
>
> You change it to take in account the refcnt for ext mbufs.
>
> I may have missed something but I wonder why it is not simply:
>
> rte_pktmbuf_free(pkt_seg);
Here, I mean rte_pktmbuf_free(pkt)
(one free() call for the mbuf chain, not one per segment)
>
> It will decrease the proper refcnt, and free the mbufs if they
> are not used.
>
> Again, sorry if this is not the issue your are referring to, but
> in this case I really think that having a simple example code that
> shows the issue would help.
>
> Regards,
> Olivier
>
>
>
> >
> > diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c
> > index e663fd4..3b69cbb 100644
> >
> > --- a/lib/librte_vhost/virtio_net.c
> >
> > +++ b/lib/librte_vhost/virtio_net.c
> >
> > @@ -2136,10 +2136,20 @@ uint16_t rte_vhost_poll_enqueue_completed(int vid,
> > uint16_t queue_id,
> >
> > return NULL;
> >
> > }
> >
> >
> >
> > +struct shinfo_arg {
> >
> > + void *buf;
> >
> > + struct rte_mbuf *mbuf;
> >
> > +};
> >
> > +
> >
> > static void
> >
> > -virtio_dev_extbuf_free(struct rte_mbuf * caller_m __rte_unused, void
> > *opaque)
> >
> > +virtio_dev_extbuf_free(struct rte_mbuf *caller_m, void *opaque)
> >
> > {
> >
> > - rte_free(opaque);
> >
> > + struct shinfo_arg *arg = (struct shinfo_arg *)opaque;
> >
> > +
> >
> > + rte_free(arg->buf);
> >
> > + if (caller_m != arg->mbuf)
> >
> > + rte_pktmbuf_free(arg->mbuf);
> >
> > + rte_free(arg);
> >
> > }
> >
> >
> >
> > static int
> >
> > @@ -2172,8 +2182,14 @@ uint16_t rte_vhost_poll_enqueue_completed(int vid,
> > uint16_t queue_id,
> >
> >
> >
> > /* Initialize shinfo */
> >
> > if (shinfo) {
> >
> > + struct shinfo_arg *arg = (struct shinfo_arg *)
> >
> > + rte_malloc(NULL,
> > sizeof(struct shinfo_arg),
> >
> > +
> > RTE_CACHE_LINE_SIZE);
> >
> > +
> >
> > + arg->buf = buf;
> >
> > + arg->mbuf = pkt;
> >
> > shinfo->free_cb = virtio_dev_extbuf_free;
> >
> > - shinfo->fcb_opaque = buf;
> >
> > + shinfo->fcb_opaque = arg;
> >
> > rte_mbuf_ext_refcnt_set(shinfo, 1);
> >
> > } else {
> >
> > shinfo =
> > rte_pktmbuf_ext_shinfo_init_helper(buf, &buf_len,
> >
> > --
> >
> > 1.8.3.1
> >
> >
> >
> > /*
> > * Allocate a host supported pktmbuf.
> > */
> > static __rte_always_inline struct rte_mbuf *
> > virtio_dev_pktmbuf_alloc(struct virtio_net *dev, struct rte_mempool *mp,
> > uint32_t data_len)
> > {
> > struct rte_mbuf *pkt = rte_pktmbuf_alloc(mp);
> >
> > if (unlikely(pkt == NULL)) {
> > VHOST_LOG_DATA(ERR,
> > "Failed to allocate memory for mbuf.\n");
> > return NULL;
> > }
> >
> > if (rte_pktmbuf_tailroom(pkt) >= data_len)
> > return pkt;
> >
> > /* attach an external buffer if supported */
> > if (dev->extbuf && !virtio_dev_extbuf_alloc(pkt, data_len))
> > return pkt;
> >
> > /* check if chained buffers are allowed */
> > if (!dev->linearbuf)
> > return pkt;
> >
> > /* Data doesn't fit into the buffer and the host supports
> > * only linear buffers
> > */
> > rte_pktmbuf_free(pkt);
> >
> > return NULL;
> > }
> >