[dpdk-users] Beginners question: rte_eth_tx_burst, rte_mbuf access synchronization

2016-11-11 Thread Anupam Kapoor
On Fri, Nov 11, 2016 at 3:19 PM, Philipp Beyer 
wrote:

> Basically, I need to send the same packet over a single interface, over an
> over again, with single bytes changed each time.
> I use rte_eth_tx_burst to send 16 packets at once. As I want to re-use the
> same buffers in a very simple way, I just increment the refcnt
> accordingly.
>

just throwing it out there: have you considered a trivial scheme of
repeatedly invoking 'rte_eth_tx_burst(...)'? till a value less than
'nb_pkts' is returned. once you reach that state, then the reuse can
happen...

--
kind regards
anupam
?

In the beginning was the lambda, and the lambda was with Emacs, and Emacs
was the lambda.


[dpdk-users] Beginners question: rte_eth_tx_burst, rte_mbuf access synchronization

2016-11-11 Thread Philipp Beyer
Hi Matt,

Thanks for your answers. This helps as I am still stabbing in the dark 
quite a lot.

I actually use 16, not one, distinct buffers to be sent in one burst. 
But still, your conclusion is correct: I mess with the refcount, adjust 
payload after calling rte_eth_tx_burst, and therefore get undefined 
behaviour.

Your answer pretty much sound like you understood my point, so it seems 
the solution I am looking for does not exist. Unfortunately, it is not 
really only one byte i am changing. This was just a simplification, its 
a few byte actually, but still a small portion of the payload. So your 
idea won't really work.

But I might have found another idea: What about preparing all buffers of 
a memory pool with the same payload? I should than get a pre-filled 
buffer from rte_pktmbuf_alloc, right? Let's say, I initialize a buffer 
for transmittion, the transmitting code free's this buffer, and I get 
the same buffer back from rte_pktmbuf_alloc. What do I have to 
re-initialize to have the same buffer again? Only the payload length? Is 
this approach feasible, based on documented/specified behaviour?


Philipp


Am 11.11.2016 um 14:45 schrieb Matt Laswell:
> Hi Philipp,
>
> I'm a little unclear what you mean with your comments about adjusting 
> the refcnt in your mbufs.  You are absolutely correct that 
> rte_eth_tx_burst doesn't synchronously transmit the packets.  Instead, 
> it puts them in a ring that is serviced by the poll mode driver.  
> Eventually, they are handed off to the NIC, which copies them into its 
> buffer and ultimately sends them on the wire.
>
> The architecture you've described won't work for the reasons you've 
> surmised - when you hand a pointer to the pack to the device driver, 
> you are giving it control of the memory pointed to.  If you continue 
> to modify its contents at that point, the results will be 
> unpredictable.  Also, it sounds as though you might really just have 
> 16 pointers to a single packet, with a reference count of 16.  Since 
> you don't actually have 16 buffers, if you modify the contents of any 
> one packet, you're modifying them all.
>
> Let me suggest that you might want to rethink your scheme. Rather than 
> trying to reverse engineer a way to either make the PMD behave 
> synchronously or to give you a callback, I would consider prebuilding 
> packet contents at init time, then allocating mbufs and copying the 
> contents in.  I suspect you've avoided an approach like this because 
> you'd like to not copy mostly the same data over and over when you 
> only want to modify one byte.
>
> An alternative approach would be to use indirect mbufs.  In essence, 
> each packet you want to send might be made up of three mbufs.  The 
> first is an indirect mbuf that points to one that contains the common 
> data at the start of your packets. The second contains the one byte 
> that you wish to change.  The third is an indirect mbuf that points to 
> the common data at the end of your packets.  I haven't used this 
> approach myself, but I suspect it would let you avoid copying so much 
> data.
>
> - Matt
>
> On Fri, Nov 11, 2016 at 3:49 AM, Philipp Beyer  > wrote:
>
> Hi!
>
> I am just writing my first code using dpdk, a traffic generator,
> for which I started with the l2fwd example.
>
> Basically, I need to send the same packet over a single interface,
> over an over again, with single bytes changed each time.
> I use rte_eth_tx_burst to send 16 packets at once. As I want to
> re-use the same buffers in a very simple way, I just increment the
> refcnt
> accordingly.
>
> My current code prepares all 16 buffers, calls rte_eth_tx_burst
> until all 16 packets are stored in the transmit ring, and starts
> over again, adjusting the buffers to send the next 16 packets.
>
> Currently I observe duplicate packets, although every packet
> should be individual due to single byte adjustments.
>
> My current problem is, as I guess, that rte_eth_tx_burst does not
> synchnolously transmit the count of packets, which is returned to
> the caller, but just stores them in transmit queue. So, I am not
> allowed to instantly re-use these buffers again.
>
> My question is: How do I know when to re-use buffers passed to
> rte_eth_tx_burst. Of course, I can check their refcnt member, and
> this would be perfectly fine. Apparently, I should have at least
> BURST_SIZE*2 buffers, passing BURST_SIZE buffers at once, so I can
> manipulate one set of buffers while the other is transmitted. But
> I am missing the idea of the best synchronization scheme here: How
> should I wait on this refcnt to drop?
>
> Some blind guessing:
> If I take the documentation of rte_eth_tx_burst literally, I could
> get the idea that refcounts of buffers are only decreased (buffers
> are 'freed'), while rte_eth_tx_burst is executed, but one function
>

[dpdk-users] pmdinfogen issues: cross compilation for ARM fails with older host compiler

2016-11-11 Thread Jan Viktorin
Hello all,

On Fri, 11 Nov 2016 10:34:39 +
Hemant Agrawal  wrote:

> Hi Neil,
>Pmdinfogen compiles with host compiler. It usages 
> rte_byteorder.h of the target platform.

This seems wierd to me... why is it so? I couldn't find any usage of 
rte_byteorder.h in the source of pmdinfogen
(what am I missing?). Why is it included there?

The pmdinfogen executes on the host but works with the (cross-compiled) target 
binaries. Is that right? If the tool
needs to know endianity then we probably need a header telling just the 
target's endianity (or other metadata).

> However, if the host compiler is older than 4.8, it will be an issue during 
> cross compilation for some platforms.
> e.g. if we are compiling on x86 host for ARM, x86 host compiler will not 
> understand the arm asm instructions.

This is not the actual issue. Consider an ARM build server that cross-compiles 
DPDK for Intel x86 (I admit that this
is quite a ridiculous situation, so take it easy ;)). Then we have just 
opposite issues... Would we like to fill the
DPDK x86 code base with #ifdef...#endif everytime there is some assembly code? 
I'd just like to point out that this
single instruction is not the true source of the problem. It is like 
complaining that nasm cannot compile Thumb2
instructions... No it cannot, sorry.

> 
> /* fix missing __builtin_bswap16 for gcc older then 4.8 */
> #if !(__GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 8))
> static inline uint16_t rte_arch_bswap16(uint16_t _x)
> {
>register uint16_t x = _x;
>asm volatile ("rev16 %0,%1"
> : "=r" (x)
> : "r" (x)
> );
>return x;
> }
> #endif
> 
> One easy solution is that we add compiler platform check in this code section 
> of rte_byteorder.h
> e.g
> #if !(defined __arm__ || defined __aarch64__)
> static inline uint16_t rte_arch_bswap16(uint16_t _x)
> {
>return (_x >> 8) | ((_x << 8) & 0xff00);
> }
> #else ?.
> 
> Is there a better way to fix it?

In my opinion, this would work as a hotfix but not as a solution.

Kind regards
Jan

> 
> Regards,
> Hemant
> 
> 
> From: Michael Wildt [mailto:michael.wildt at broadcom.com]
> Sent: Wednesday, September 14, 2016 7:18 PM
> To: Hemant Agrawal 
> Cc: Thomas Monjalon ; users at dpdk.org
> Subject: Re: [dpdk-users] Cross compile for ARM64 fails due to librte_vhost 
> and pmdinfogen issues
> 
> Hi Hemant,
> 
> Thanks for the pointer to the 4.9.3 version. Haven't had issues with 4.9.2 
> but good to know.
> 
> I gave that one a try and that works as well but as with the 5.3 I have to be 
> on a Ubuntu not RHEL6 to make it work.
> 
> Thanks,
> Michael
> 
> On Wed, Sep 14, 2016 at 3:25 AM, Hemant Agrawal  nxp.com> wrote:
> Hi Michael,
> One of the problem, I found with Linaro gcc 4.9 toolchain for i686 
> (default one), that it seems to be built with older kernel headers (<3.8). 
> This usages older linux/vhost.h file.
> 
> However, we have not observed this issue with x86_64 based toolchain on 64 
> bit m/c.
>  
> https://releases.linaro.org/14.11/components/toolchain/binaries/aarch64-linux-gnu/
> 
> Regards,
> Hemant
> 
> > -Original Message-
> > From: users [mailto:users-bounces at dpdk.org > dpdk.org>] On Behalf Of Michael Wildt
> > Sent: Wednesday, September 14, 2016 12:05 AM
> > To: Thomas Monjalon mailto:thomas.monjalon at 
> > 6wind.com>>
> > Cc: users at dpdk.org
> > Subject: Re: [dpdk-users] Cross compile for ARM64 fails due to librte_vhost 
> > and
> > pmdinfogen issues
> >
> > Hi Thomas,
> >
> > The Linaro gcc 4.9 is correct when it gets to __GNUC_MINOR__, used a test
> > application. Its actually 4.9.2.
> >
> > Tried a newer Linaro tool chain, turned out to be a bit more complicated 
> > since
> > that does not work on RHEL6, is however a success. With Linaro 5.3 one can
> > cross compile dpdk fine with no errors, though the rte_byteorder.h file 
> > still
> > points to arm's version, but pmdinfogen builds.
> >
> > Probably should still fix both issues just to keep the base clean.
> >
> > At least I have a workaround in the interim.
> >
> > Thanks for the help.
> >
> > Thanks,
> > Michael
> >
> >
> > On Tue, Sep 13, 2016 at 11:07 AM, Thomas Monjalon
> > mailto:thomas.monjalon at 6wind.com>  
> > > wrote:  
> >  
> > > 2016-09-13 07:45, Michael Wildt:  
> > > > Hi Thomas,
> > > >
> > > > Appreciate the assistance. Please see inline.
> > > >
> > > >
> > > > On Tue, Sep 13, 2016 at 5:03 AM, Thomas Monjalon <  
> > > thomas.monjalon at 6wind.com>  
> > > > wrote:
> > > >  
> > > > > Hi,
> > > > >
> > > > > 2016-09-12 22:20, Michael Wildt:  
> > > > > > I'm attempting to cross compile DPDK on an x86 for an ARM64 target. 
> > > > > >  
> > > This  
> > > > > > fails in the following areas, using 

[dpdk-users] pmdinfogen issues: cross compilation for ARM fails with older host compiler

2016-11-11 Thread Neil Horman
On Fri, Nov 11, 2016 at 02:48:51PM +0100, Jan Viktorin wrote:
> Hello all,
> 
> On Fri, 11 Nov 2016 10:34:39 +
> Hemant Agrawal  wrote:
> 
> > Hi Neil,
> >Pmdinfogen compiles with host compiler. It usages 
> > rte_byteorder.h of the target platform.
> 
> This seems wierd to me... why is it so? I couldn't find any usage of 
> rte_byteorder.h in the source of pmdinfogen
> (what am I missing?). Why is it included there?
> 
See the CONVERT_NATIVE macro in pmdinfogen.h.  It makes use of the various
rte_[le|be]_to_cpu macros from rte_byteorder.h

> The pmdinfogen executes on the host but works with the (cross-compiled) 
> target binaries. Is that right? If the tool
> needs to know endianity then we probably need a header telling just the 
> target's endianity (or other metadata).
> 
pmdinfogen works on ELF object files, and can extract the endianess from the ELF
header itself (using the e_ident[EI_DATA] area).

> > However, if the host compiler is older than 4.8, it will be an issue during 
> > cross compilation for some platforms.
> > e.g. if we are compiling on x86 host for ARM, x86 host compiler will not 
> > understand the arm asm instructions.
> 
> This is not the actual issue. Consider an ARM build server that 
> cross-compiles DPDK for Intel x86 (I admit that this
> is quite a ridiculous situation, so take it easy ;)). Then we have just 
> opposite issues... Would we like to fill the
> DPDK x86 code base with #ifdef...#endif everytime there is some assembly 
> code? I'd just like to point out that this
> single instruction is not the true source of the problem. It is like 
> complaining that nasm cannot compile Thumb2
> instructions... No it cannot, sorry.
> 
It sounds like the issue is a general 'how to get support for another arch'
question.  In the case of rte_byteorder.h, its actually pretty cut and dry,
because thankfully all the instructions are wrapped up into nice C inline
functions or macros.  The trick is to simply define the api instructions in the
file for each arch, with a default generic case that just uses C, so it can be
compiled into whatever the target arch needs (although it may run more slowly).
That gets you initial support, and then you can optimize be creating a special
case for the new arch.  You have to do that for every API set that has per-arch
optimizations (the atomic ops, the tsc ops, memcpy, cpuflags, prefetch, etc).
Its time consuming, but its just the way it is.

> > 
> > /* fix missing __builtin_bswap16 for gcc older then 4.8 */
> > #if !(__GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 8))
> > static inline uint16_t rte_arch_bswap16(uint16_t _x)
> > {
> >register uint16_t x = _x;
> >asm volatile ("rev16 %0,%1"
> > : "=r" (x)
> > : "r" (x)
> > );
> >return x;
> > }
> > #endif
> > 
> > One easy solution is that we add compiler platform check in this code 
> > section of rte_byteorder.h
> > e.g
> > #if !(defined __arm__ || defined __aarch64__)
> > static inline uint16_t rte_arch_bswap16(uint16_t _x)
> > {
> >return (_x >> 8) | ((_x << 8) & 0xff00);
> > }
> > #else ?.
> > 
> > Is there a better way to fix it?
> 
Well, almost, what you have above is a good solution, but it shouldn't be the
ARM solution, it should be the code used if an arch specific variant of the code
isn't defined. The pattern rte_byteorder should follow is

#if (defined i686 || defined x86_64)

#elif (defined ppc || ppc64)

#else

#endif

The idea is to have a generic version that works for any arch to fall back on,
then if you have a faster way to do it on your arch, you can add a clause at
your leisure to do so.

Neil

> In my opinion, this would work as a hotfix but not as a solution.
> 
> Kind regards
> Jan
> 
> > 
> > Regards,
> > Hemant
> > 
> > 
> > From: Michael Wildt [mailto:michael.wildt at broadcom.com]
> > Sent: Wednesday, September 14, 2016 7:18 PM
> > To: Hemant Agrawal 
> > Cc: Thomas Monjalon ; users at dpdk.org
> > Subject: Re: [dpdk-users] Cross compile for ARM64 fails due to librte_vhost 
> > and pmdinfogen issues
> > 
> > Hi Hemant,
> > 
> > Thanks for the pointer to the 4.9.3 version. Haven't had issues with 4.9.2 
> > but good to know.
> > 
> > I gave that one a try and that works as well but as with the 5.3 I have to 
> > be on a Ubuntu not RHEL6 to make it work.
> > 
> > Thanks,
> > Michael
> > 
> > On Wed, Sep 14, 2016 at 3:25 AM, Hemant Agrawal  > nxp.com> wrote:
> > Hi Michael,
> > One of the problem, I found with Linaro gcc 4.9 toolchain for i686 
> > (default one), that it seems to be built with older kernel headers (<3.8). 
> > This usages older linux/vhost.h file.
> > 
> > However, we have not observed this issue with x86_64 based toolchain on 64 
> > bit m/c.
> >  
> > 

[dpdk-users] Beginners question: rte_eth_tx_burst, rte_mbuf access synchronization

2016-11-11 Thread Philipp Beyer
Hi!

I am just writing my first code using dpdk, a traffic generator, for 
which I started with the l2fwd example.

Basically, I need to send the same packet over a single interface, over 
an over again, with single bytes changed each time.
I use rte_eth_tx_burst to send 16 packets at once. As I want to re-use 
the same buffers in a very simple way, I just increment the refcnt
accordingly.

My current code prepares all 16 buffers, calls rte_eth_tx_burst until 
all 16 packets are stored in the transmit ring, and starts over again, 
adjusting the buffers to send the next 16 packets.

Currently I observe duplicate packets, although every packet should be 
individual due to single byte adjustments.

My current problem is, as I guess, that rte_eth_tx_burst does not 
synchnolously transmit the count of packets, which is returned to the 
caller, but just stores them in transmit queue. So, I am not allowed to 
instantly re-use these buffers again.

My question is: How do I know when to re-use buffers passed to 
rte_eth_tx_burst. Of course, I can check their refcnt member, and this 
would be perfectly fine. Apparently, I should have at least BURST_SIZE*2 
buffers, passing BURST_SIZE buffers at once, so I can manipulate one set 
of buffers while the other is transmitted. But I am missing the idea of 
the best synchronization scheme here: How should I wait on this refcnt 
to drop?

Some blind guessing:
If I take the documentation of rte_eth_tx_burst literally, I could get 
the idea that refcounts of buffers are only decreased (buffers are 
'freed'), while rte_eth_tx_burst is executed, but one function call 
might free buffers used by previous function calls. If this is correct, 
I still do not see a complete synchronization scheme. There is still a 
chance that I end up without any buffers left, which means I do not have 
a chance to call rte_eth_tx_burst again to free buffers.

Thanks for any help,
Philipp



[dpdk-users] pmdinfogen issues: cross compilation for ARM fails with older host compiler

2016-11-11 Thread Hemant Agrawal
Hi Neil,
   Pmdinfogen compiles with host compiler. It usages 
rte_byteorder.h of the target platform.
However, if the host compiler is older than 4.8, it will be an issue during 
cross compilation for some platforms.
e.g. if we are compiling on x86 host for ARM, x86 host compiler will not 
understand the arm asm instructions.

/* fix missing __builtin_bswap16 for gcc older then 4.8 */
#if !(__GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 8))
static inline uint16_t rte_arch_bswap16(uint16_t _x)
{
   register uint16_t x = _x;
   asm volatile ("rev16 %0,%1"
: "=r" (x)
: "r" (x)
);
   return x;
}
#endif

One easy solution is that we add compiler platform check in this code section 
of rte_byteorder.h
e.g
#if !(defined __arm__ || defined __aarch64__)
static inline uint16_t rte_arch_bswap16(uint16_t _x)
{
   return (_x >> 8) | ((_x << 8) & 0xff00);
}
#else ?.

Is there a better way to fix it?

Regards,
Hemant


From: Michael Wildt [mailto:michael.wi...@broadcom.com]
Sent: Wednesday, September 14, 2016 7:18 PM
To: Hemant Agrawal 
Cc: Thomas Monjalon ; users at dpdk.org
Subject: Re: [dpdk-users] Cross compile for ARM64 fails due to librte_vhost and 
pmdinfogen issues

Hi Hemant,

Thanks for the pointer to the 4.9.3 version. Haven't had issues with 4.9.2 but 
good to know.

I gave that one a try and that works as well but as with the 5.3 I have to be 
on a Ubuntu not RHEL6 to make it work.

Thanks,
Michael

On Wed, Sep 14, 2016 at 3:25 AM, Hemant Agrawal mailto:hemant.agrawal at nxp.com>> wrote:
Hi Michael,
One of the problem, I found with Linaro gcc 4.9 toolchain for i686 
(default one), that it seems to be built with older kernel headers (<3.8). This 
usages older linux/vhost.h file.

However, we have not observed this issue with x86_64 based toolchain on 64 bit 
m/c.
 
https://releases.linaro.org/14.11/components/toolchain/binaries/aarch64-linux-gnu/

Regards,
Hemant

> -Original Message-
> From: users [mailto:users-bounces at dpdk.org dpdk.org>] On Behalf Of Michael Wildt
> Sent: Wednesday, September 14, 2016 12:05 AM
> To: Thomas Monjalon mailto:thomas.monjalon at 
> 6wind.com>>
> Cc: users at dpdk.org
> Subject: Re: [dpdk-users] Cross compile for ARM64 fails due to librte_vhost 
> and
> pmdinfogen issues
>
> Hi Thomas,
>
> The Linaro gcc 4.9 is correct when it gets to __GNUC_MINOR__, used a test
> application. Its actually 4.9.2.
>
> Tried a newer Linaro tool chain, turned out to be a bit more complicated since
> that does not work on RHEL6, is however a success. With Linaro 5.3 one can
> cross compile dpdk fine with no errors, though the rte_byteorder.h file still
> points to arm's version, but pmdinfogen builds.
>
> Probably should still fix both issues just to keep the base clean.
>
> At least I have a workaround in the interim.
>
> Thanks for the help.
>
> Thanks,
> Michael
>
>
> On Tue, Sep 13, 2016 at 11:07 AM, Thomas Monjalon
> mailto:thomas.monjalon at 6wind.com>
> > wrote:
>
> > 2016-09-13 07:45, Michael Wildt:
> > > Hi Thomas,
> > >
> > > Appreciate the assistance. Please see inline.
> > >
> > >
> > > On Tue, Sep 13, 2016 at 5:03 AM, Thomas Monjalon <
> > thomas.monjalon at 6wind.com>
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > 2016-09-12 22:20, Michael Wildt:
> > > > > I'm attempting to cross compile DPDK on an x86 for an ARM64 target.
> > This
> > > > > fails in the following areas, using latest dpdk as of 9/12. When
> > > > compiling
> > > > > natively there are no issues.
> > > >
> > > > Your analysis below seems good.
> > > > Interestingly, I do not see such error (don't know why).
> > > > Please could you share the commands you are using?
> > > >
> > >
> > > Sure can.
> > >
> > > make config T=arm64-armv8a-linuxapp-gcc CROSS=/projects/ccxsw/
> > > toolchains/gcc-linaro-aarch64-linux-gnu-4.9-2014.09_linux/
> > bin/aarch64-linux-gnu-
> > > ARCH=arm64
> > >
> > > make T=arm64-armv8a-linuxapp-gcc CROSS=/projects/ccxsw/
> > > toolchains/gcc-linaro-aarch64-linux-gnu-4.9-2014.09_linux/
> > bin/aarch64-linux-gnu-
> > > ARCH=arm64 RTE_KERNELDIR=/projects/kernel
> > >
> > > > > - librte_vhost, fails with:
> > > > >
> > > > > /projects/dpdk_latest/lib/librte_vhost/vhost_user/virtio-
> > > > net-user.c:250:23:
> > > > > error: array subscript is above array bounds [-Werror=array-bounds]
> > > > >rvq = dev->virtqueue[i * VIRTIO_QNUM + VIRTIO_RXQ];
> > > > [...]
> > > > > - buildtools/pmdinfogen, fails with:
> > > > >
> > > > > == Build buildtools/pmdinfogen
> > > > >   HOSTCC pmdinfogen.o
> > > > > /projects/dpdk_test_wget/dpdk-16.07/build/include/rte_byteorder.h:
> > > > > Assembler messages:
> > > > > /projects/dpdk_test_wget/dpdk-16.07/build/include/rte_
> > byteorder.h:53:
> > > > > Error: no such instruction: `rev16 

[dpdk-users] Beginners question: rte_eth_tx_burst, rte_mbuf access synchronization

2016-11-11 Thread Matt Laswell
Hi Philipp,

I'm a little unclear what you mean with your comments about adjusting the
refcnt in your mbufs.  You are absolutely correct that rte_eth_tx_burst
doesn't synchronously transmit the packets.  Instead, it puts them in a
ring that is serviced by the poll mode driver.  Eventually, they are handed
off to the NIC, which copies them into its buffer and ultimately sends them
on the wire.

The architecture you've described won't work for the reasons you've
surmised - when you hand a pointer to the pack to the device driver, you
are giving it control of the memory pointed to.  If you continue to modify
its contents at that point, the results will be unpredictable.  Also, it
sounds as though you might really just have 16 pointers to a single packet,
with a reference count of 16.  Since you don't actually have 16 buffers, if
you modify the contents of any one packet, you're modifying them all.

Let me suggest that you might want to rethink your scheme.  Rather than
trying to reverse engineer a way to either make the PMD behave
synchronously or to give you a callback, I would consider prebuilding
packet contents at init time, then allocating mbufs and copying the
contents in.  I suspect you've avoided an approach like this because you'd
like to not copy mostly the same data over and over when you only want to
modify one byte.

An alternative approach would be to use indirect mbufs.  In essence, each
packet you want to send might be made up of three mbufs.  The first is an
indirect mbuf that points to one that contains the common data at the start
of your packets.  The second contains the one byte that you wish to
change.  The third is an indirect mbuf that points to the common data at
the end of your packets.  I haven't used this approach myself, but I
suspect it would let you avoid copying so much data.

- Matt

On Fri, Nov 11, 2016 at 3:49 AM, Philipp Beyer 
wrote:

> Hi!
>
> I am just writing my first code using dpdk, a traffic generator, for which
> I started with the l2fwd example.
>
> Basically, I need to send the same packet over a single interface, over an
> over again, with single bytes changed each time.
> I use rte_eth_tx_burst to send 16 packets at once. As I want to re-use the
> same buffers in a very simple way, I just increment the refcnt
> accordingly.
>
> My current code prepares all 16 buffers, calls rte_eth_tx_burst until all
> 16 packets are stored in the transmit ring, and starts over again,
> adjusting the buffers to send the next 16 packets.
>
> Currently I observe duplicate packets, although every packet should be
> individual due to single byte adjustments.
>
> My current problem is, as I guess, that rte_eth_tx_burst does not
> synchnolously transmit the count of packets, which is returned to the
> caller, but just stores them in transmit queue. So, I am not allowed to
> instantly re-use these buffers again.
>
> My question is: How do I know when to re-use buffers passed to
> rte_eth_tx_burst. Of course, I can check their refcnt member, and this
> would be perfectly fine. Apparently, I should have at least BURST_SIZE*2
> buffers, passing BURST_SIZE buffers at once, so I can manipulate one set of
> buffers while the other is transmitted. But I am missing the idea of the
> best synchronization scheme here: How should I wait on this refcnt to drop?
>
> Some blind guessing:
> If I take the documentation of rte_eth_tx_burst literally, I could get the
> idea that refcounts of buffers are only decreased (buffers are 'freed'),
> while rte_eth_tx_burst is executed, but one function call might free
> buffers used by previous function calls. If this is correct, I still do not
> see a complete synchronization scheme. There is still a chance that I end
> up without any buffers left, which means I do not have a chance to call
> rte_eth_tx_burst again to free buffers.
>
> Thanks for any help,
> Philipp
>
>