Hi Keith, > -----Original Message----- > From: Wiles, Roger Keith [mailto:keith.wiles at windriver.com] > Sent: Tuesday, October 07, 2014 3:22 PM > To: Ananyev, Konstantin > Cc: dev at dpdk.org > Subject: Re: [dpdk-dev] [PATCH 2/2] Adding the routines > rte_pktmbuf_alloc_bulk() and rte_pktmbuf_free_bulk() > > > On Oct 7, 2014, at 4:09 AM, Ananyev, Konstantin <konstantin.ananyev at > intel.com> wrote: > > > > > > >> -----Original Message----- > >> From: Wiles, Roger Keith [mailto:keith.wiles at windriver.com] > >> Sent: Monday, October 06, 2014 9:08 PM > >> To: Ananyev, Konstantin > >> Cc: dev at dpdk.org > >> Subject: Re: [dpdk-dev] [PATCH 2/2] Adding the routines > >> rte_pktmbuf_alloc_bulk() and rte_pktmbuf_free_bulk() > >> > >> Attaching to the list does not work. If you want the code let me know it > >> is only about 5K in size. > >> > >> On Oct 6, 2014, at 2:45 PM, Wiles, Roger Keith <keith.wiles at > >> windriver.com> wrote: > >> > >>> > >>> On Oct 6, 2014, at 11:13 AM, Wiles, Roger Keith <keith.wiles at > >>> windriver.com> wrote: > >>> > >>>> > >>>> On Oct 6, 2014, at 10:54 AM, Ananyev, Konstantin <konstantin.ananyev at > >>>> intel.com> wrote: > >>>> > >>>>>> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Bruce > >>>>>> Richardson > >>>>>> Sent: Monday, October 06, 2014 3:54 PM > >>>>>> To: Wiles, Roger Keith (Wind River) > >>>>>> Cc: dev at dpdk.org > >>>>>> Subject: Re: [dpdk-dev] [PATCH 2/2] Adding the routines > >>>>>> rte_pktmbuf_alloc_bulk() and rte_pktmbuf_free_bulk() > >>>>>> > >>>>>> On Mon, Oct 06, 2014 at 03:50:38PM +0100, Wiles, Roger Keith wrote: > >>>>>>> Hi Bruce, > >>>>>>> > >>>>>>> Do I need to reject the for the new routines or just make sure the > >>>>>>> vector driver does not get updated to use those routines? > >>>>>>> > >>>>>> > >>>>>> The new routines are probably useful in the general case. I see no > >>>>>> issue > >>>>>> with having them in the code, so long as the vector driver is not > >>>>>> modified > >>>>>> to use them. > >>>>> > >>>>> I 'd say the same thing for non-vector RX/TX PMD code-paths too. > >>>>> > >>>>> BTW, are the new functions comments valid? > >>>>> > >>>>> + * @return > >>>>> + * - 0 if the number of mbufs allocated was ok > >>>>> + * - <0 is an ERROR. > >>>>> + */ > >>>>> +static inline int __rte_mbuf_raw_alloc_bulk( > >>>>> > >>>>> Though, as I can see __rte_mbuf_raw_alloc_bulk() returns either: > >>>>> - number of allocated mbuf (cnt) > >>>>> - negative error code > >>>> > >>>> Let me fix up the comments. > >>>>> > >>>>> And: > >>>>> + * @return > >>>>> + * - The number of valid mbufs pointers in the m_list array. > >>>>> + * - Zero if the request cnt could not be allocated. > >>>>> + */ > >>>>> +static inline int __attribute__((always_inline)) > >>>>> +rte_pktmbuf_alloc_bulk(struct rte_mempool *mp, struct rte_mbuf > >>>>> *m_list[], int16_t cnt) > >>>>> +{ > >>>>> + return __rte_mbuf_raw_alloc_bulk(mp, m_list, cnt); > >>>>> +} > >>>>> > >>>>> Shouldn't be "less than zero if the request cnt could not be > >>>>> allocated."? > >>>>> > >>>>> BTW, is there any point to have __rte_mbuf_raw_alloc_bulk() at all? > >>>>> After all, as you are calling rte_pktmbuf_reset() inside it, it doesn't > >>>>> look __raw__ any more. > >>>>> Might be just put its content into rte_pktmbuf_alloc_bulk() and get rid > >>>>> of it. > >>>>> > >>>> I was just following the non-bulk routine style __rte_mbuf_raw_alloc(), > >>>> but I can pull that into a single routine. > >>>> > >>>>> Also wonder, what is the advantage of having multiple counters inside > >>>>> the same loop? > >>>>> i.e: > >>>>> + for(i = 0; i < cnt; i++) { > >>>>> + m = *m_list++; > >>>>> > >>>>> Why not just: > >>>>> > >>>>> for(i = 0; i < cnt; i++) { > >>>>> m = &m_list[i]; > >>>>> > >>>>> Same for free: > >>>>> + while(npkts--) > >>>>> + rte_pktmbuf_free(*m_list++); > >>>>> > >>>>> While not just: > >>>>> for (i = 0; i < npkts; i++) > >>>>> rte_pktmbuf_free(&m_list[i]); > >>>> > >>>> Maybe I have it wrong or the compilers are doing the right thing now, > >>>> but at one point the &m_list[i] would cause the compiler > to > >> generate a shift or multiple of 'i' and then add it to the base of m_list. > >> If that is not the case anymore then I can update the code as > >> you suggested. Using the *m_list++ just adds the size of a pointer to a > >> register and continues. > >>> > >>> I compared the clang assembler (.s file) output from an example test code > >>> I wrote to see if we have any differences in the code > >> using the two styles and I found no difference and the code looked the > >> same. I am not a Intel assembler expert and I would > suggest > >> someone else determine if it generates different code. I tried to compare > >> the GCC outputs and it did look the same to me. > > > > That's was my question: > > Modern compilers are able to generate a good code for a simple loop as > > above. > > So what's the point to use 2 iterators inside the loop, when just one is > > enough? > > Nothing wrong technically, but makes code a bit harder to follow. > > Plus, in general, it is a good practise to minimise number of iterators > > inside the loop, when possible. > > > > Konstantin > > Hi Konstantin, > > I really do not understand the concern if the code is the same, as it appears > to me the current patch is very clean and simple. Maybe > you have not seen the v2 patch and now v3 patch I sent this morning to fix > Bruce's comment suggestion. > > For the case of the free routine your suggestion would require an extra > counter/variable a bit more code a 'for' loop instead of a > 'while' loop.
My point was that just one iterator for both loops is enough. In general, it is a good practise to minimise number of iterators per loop if possible: in some cases compiler might get confused and wouldn't be able to eliminate redundant iterators itself. Though yes - technically there is nothing wrong with your approach. So if you prefer to keep it as it is - I wouldn't insist. Konstantin > +static inline void __attribute__((always_inline)) > +rte_pktmbuf_free_bulk(struct rte_mbuf *m_list[], int16_t npkts) > +{ > + while(npkts--) > + rte_pktmbuf_free(*m_list++); > +} > > For the case of the alloc routine I did remove the rte_mbuf * m variable and > now I believe it is very clean and changing it to use index > variables is just a personal preference. I personal preference of this type > is not useful IMO and does not cause any harm. Unless you > can suggest a good technical reason to change I am going to leave the patch > as is. > > +static inline int __attribute__((always_inline)) > +rte_pktmbuf_alloc_bulk(struct rte_mempool *mp, struct rte_mbuf *m_list[], > int16_t cnt) > +{ > + int ret; > + > + ret = rte_mempool_get_bulk(mp, (void **)m_list, cnt); > + if ( ret == 0 ) { > + ret = cnt; > + while(cnt--) { > +#ifdef RTE_MBUF_REFCNT > + rte_mbuf_refcnt_set(*m_list, 1); > +#endif /* RTE_MBUF_REFCNT */ > + rte_pktmbuf_reset(*m_list++); > + } > + } > + return ret; > +} > > >>> > >>> I have attached the code and output, please let me know if I did > >>> something wrong, but as it stands using the original style is what I > >> want to go with. > >>> > >>>>> > >>>>> Konstantin > >>>>> > >>>>>> > >>>>>> /Bruce > >>>>>> > >>>>>>> Thanks > >>>>>>> ++Keith > >>>>>>> > >>>>>>> On Oct 6, 2014, at 3:56 AM, Richardson, Bruce <bruce.richardson at > >>>>>>> intel.com> wrote: > >>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>> -----Original Message----- > >>>>>>>>> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Keith Wiles > >>>>>>>>> Sent: Sunday, October 05, 2014 12:10 AM > >>>>>>>>> To: dev at dpdk.org > >>>>>>>>> Subject: [dpdk-dev] [PATCH 2/2] Adding the routines > >>>>>>>>> rte_pktmbuf_alloc_bulk() > >>>>>>>>> and rte_pktmbuf_free_bulk() > >>>>>>>>> > >>>>>>>>> Minor helper routines to mirror the mempool routines and remove the > >>>>>>>>> code > >>>>>>>>> from applications. The ixgbe_rxtx_vec.c routine could be changed to > >>>>>>>>> use > >>>>>>>>> the ret_pktmbuf_alloc_bulk() routine inplace of > >>>>>>>>> rte_mempool_get_bulk(). > >>>>>>>>> > >>>>>>>> > >>>>>>>> I believe such a change would cause a performance regression, as the > >>>>>>>> extra init code in the alloc_bulk() function would > take > >>>>>> additional cycles and is not needed. The vector routines use the > >>>>>> mempool function directly, so that there is no overhead of > >> mbuf > >>>>>> initialization, as the vector routines use their additional > >>>>>> "knowledge" of what the mbufs will be used for to init them in a faster > >> manner > >>>>>> than can be done inside the mbuf library. > >>>>>>>> > >>>>>>>> /Bruce > >>>>>>>> > >>>>>>>>> Signed-off-by: Keith Wiles <keith.wiles at windriver.com> > >>>>>>>>> --- > >>>>>>>>> lib/librte_mbuf/rte_mbuf.h | 77 > >>>>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++ > >>>>>>>>> 1 file changed, 77 insertions(+) > >>>>>>>>> > >>>>>>>>> diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h > >>>>>>>>> index 1c6e115..f298621 100644 > >>>>>>>>> --- a/lib/librte_mbuf/rte_mbuf.h > >>>>>>>>> +++ b/lib/librte_mbuf/rte_mbuf.h > >>>>>>>>> @@ -546,6 +546,41 @@ static inline void rte_pktmbuf_reset(struct > >>>>>>>>> rte_mbuf > >>>>>>>>> *m) > >>>>>>>>> } > >>>>>>>>> > >>>>>>>>> /** > >>>>>>>>> + * @internal Allocate a list of mbufs from mempool *mp*. > >>>>>>>>> + * The use of that function is reserved for RTE internal needs. > >>>>>>>>> + * Please use rte_pktmbuf_alloc_bulk(). > >>>>>>>>> + * > >>>>>>>>> + * @param mp > >>>>>>>>> + * The mempool from which mbuf is allocated. > >>>>>>>>> + * @param m_list > >>>>>>>>> + * The array to place the allocated rte_mbufs pointers. > >>>>>>>>> + * @param cnt > >>>>>>>>> + * The number of mbufs to allocate > >>>>>>>>> + * @return > >>>>>>>>> + * - 0 if the number of mbufs allocated was ok > >>>>>>>>> + * - <0 is an ERROR. > >>>>>>>>> + */ > >>>>>>>>> +static inline int __rte_mbuf_raw_alloc_bulk(struct rte_mempool > >>>>>>>>> *mp, struct > >>>>>>>>> rte_mbuf *m_list[], int cnt) > >>>>>>>>> +{ > >>>>>>>>> + struct rte_mbuf *m; > >>>>>>>>> + int ret; > >>>>>>>>> + > >>>>>>>>> + ret = rte_mempool_get_bulk(mp, (void **)m_list, cnt); > >>>>>>>>> + if ( ret == 0 ) { > >>>>>>>>> + int i; > >>>>>>>>> + for(i = 0; i < cnt; i++) { > >>>>>>>>> + m = *m_list++; > >>>>>>>>> +#ifdef RTE_MBUF_REFCNT > >>>>>>>>> + rte_mbuf_refcnt_set(m, 1); > >>>>>>>>> +#endif /* RTE_MBUF_REFCNT */ > >>>>>>>>> + rte_pktmbuf_reset(m); > >>>>>>>>> + } > >>>>>>>>> + ret = cnt; > >>>>>>>>> + } > >>>>>>>>> + return ret; > >>>>>>>>> +} > >>>>>>>>> + > >>>>>>>>> +/** > >>>>>>>>> * Allocate a new mbuf from a mempool. > >>>>>>>>> * > >>>>>>>>> * This new mbuf contains one segment, which has a length of 0. The > >>>>>>>>> pointer > >>>>>>>>> @@ -671,6 +706,32 @@ __rte_pktmbuf_prefree_seg(struct rte_mbuf *m) > >>>>>>>>> } > >>>>>>>>> > >>>>>>>>> /** > >>>>>>>>> + * Allocate a list of mbufs from a mempool into a mbufs array. > >>>>>>>>> + * > >>>>>>>>> + * This mbuf list contains one segment per mbuf, which has a > >>>>>>>>> length of 0. The > >>>>>>>>> pointer > >>>>>>>>> + * to data is initialized to have some bytes of headroom in the > >>>>>>>>> buffer > >>>>>>>>> + * (if buffer size allows). > >>>>>>>>> + * > >>>>>>>>> + * The routine is just a simple wrapper routine to reduce code in > >>>>>>>>> the application > >>>>>>>>> and > >>>>>>>>> + * provide a cleaner API for multiple mbuf requests. > >>>>>>>>> + * > >>>>>>>>> + * @param mp > >>>>>>>>> + * The mempool from which the mbuf is allocated. > >>>>>>>>> + * @param m_list > >>>>>>>>> + * An array of mbuf pointers, cnt must be less then or equal to > >>>>>>>>> the size of the > >>>>>>>>> list. > >>>>>>>>> + * @param cnt > >>>>>>>>> + * Number of slots in the m_list array to fill. > >>>>>>>>> + * @return > >>>>>>>>> + * - The number of valid mbufs pointers in the m_list array. > >>>>>>>>> + * - Zero if the request cnt could not be allocated. > >>>>>>>>> + */ > >>>>>>>>> +static inline int __attribute__((always_inline)) > >>>>>>>>> +rte_pktmbuf_alloc_bulk(struct rte_mempool *mp, struct rte_mbuf > >>>>>>>>> *m_list[], > >>>>>>>>> int16_t cnt) > >>>>>>>>> +{ > >>>>>>>>> + return __rte_mbuf_raw_alloc_bulk(mp, m_list, cnt); > >>>>>>>>> +} > >>>>>>>>> + > >>>>>>>>> +/** > >>>>>>>>> * Free a segment of a packet mbuf into its original mempool. > >>>>>>>>> * > >>>>>>>>> * Free an mbuf, without parsing other segments in case of chained > >>>>>>>>> @@ -708,6 +769,22 @@ static inline void rte_pktmbuf_free(struct > >>>>>>>>> rte_mbuf > >>>>>>>>> *m) > >>>>>>>>> } > >>>>>>>>> } > >>>>>>>>> > >>>>>>>>> +/** > >>>>>>>>> + * Free a list of packet mbufs back into its original mempool. > >>>>>>>>> + * > >>>>>>>>> + * Free a list of mbufs by calling rte_pktmbuf_free() in a loop as > >>>>>>>>> a wrapper > >>>>>>>>> function. > >>>>>>>>> + * > >>>>>>>>> + * @param m_list > >>>>>>>>> + * An array of rte_mbuf pointers to be freed. > >>>>>>>>> + * @param npkts > >>>>>>>>> + * Number of packets to free in list. > >>>>>>>>> + */ > >>>>>>>>> +static inline void rte_pktmbuf_free_bulk(struct rte_mbuf > >>>>>>>>> *m_list[], int16_t > >>>>>>>>> npkts) > >>>>>>>>> +{ > >>>>>>>>> + while(npkts--) > >>>>>>>>> + rte_pktmbuf_free(*m_list++); > >>>>>>>>> +} > >>>>>>>>> + > >>>>>>>>> #ifdef RTE_MBUF_REFCNT > >>>>>>>>> > >>>>>>>>> /** > >>>>>>>>> -- > >>>>>>>>> 2.1.0 > >>>>>>>> > >>>>>>> > >>>>>>> Keith Wiles, Principal Technologist with CTO office, Wind River > >>>>>>> mobile 972-213-5533 > >>>> > >>>> Keith Wiles, Principal Technologist with CTO office, Wind River mobile > >>>> 972-213-5533 > >>> > >>> Keith Wiles, Principal Technologist with CTO office, Wind River mobile > >>> 972-213-5533 > >> > >> Keith Wiles, Principal Technologist with CTO office, Wind River mobile > >> 972-213-5533 > > Keith Wiles, Principal Technologist with CTO office, Wind River mobile > 972-213-5533 >