Re: [lng-odp] [API-NEXT PATCH] api-next: pktio: add odp_pktio_send_complete() definition

Ola Liljedahl Fri, 29 May 2015 07:23:26 -0700

On 29 May 2015 at 13:55, Zoltan Kiss <zoltan.k...@linaro.org> wrote:

>
>
> On 28/05/15 17:40, Ola Liljedahl wrote:
>
>> On 28 May 2015 at 17:23, Zoltan Kiss <zoltan.k...@linaro.org
>> <mailto:zoltan.k...@linaro.org>> wrote:
>>
>>
>>
>>     On 28/05/15 16:00, Ola Liljedahl wrote:
>>
>>         I disprove of this solution. TX completion processing (cleaning TX
>>         descriptor rings after transmission complete) is an implementation
>>         (hardware) aspect and should be hidden from the application.
>>
>>
>>     Unfortunately you can't, if you want your pktio application work
>>     with poll mode drivers. In that case TX completion interrupt (can
>>     be) disabled and the application has to control that as well. In
>>     case of DPDK you just call the send function (with 0 packets, if you
>>     don't have anything to send at the time)
>>
>> Why do you have to retire transmitted packet if you are not transmitting
>> new packets (and need those descriptors in the TX ring)?
>>
> Because otherwise they are a memory leak.


They are not leaked! They are still in the TX ring, just waiting to get
retired.


> Those buffers might be needed somewhere else. If they are only released
> when you send/receive packets out next time, you are in trouble, because
> that might never happen. Especially when that event is blocked because your
> TX ring is full of unreleased packets.

Having to few buffers is always a problem. You don't want to have too large
RX/TX rings because that just increases buffering and latency ("buffer
bloat" problem).


>
>  Does the
>
>> application have too few packets in the pool so that reception will
>> suffer?
>>
> Let me approach the problem from a different angle: the current workaround
> is that you have to allocate a pool with _loooads_ of buffers, so you have
> a good chance you never run out of free buffers. Probably. Because it still
> doesn't guarantee that there will be a next send/receive event on that
> interface to release the packets.



>
>
>
>
>>
>>
>>       There isn't
>>
>>         any corresponding call that refills the RX descriptor rings with
>>         fresh
>>         buffers.
>>
>>     You can do that in the receive function, I think that's how the
>>     drivers are doing it generally.
>>
>>
>>         The completion processing can be performed from any ODP call, not
>>         necessary odp_pktio_send().
>>
>>
>>     I think "any" is not specific enough. Which one?
>>
>> odp_pktio_recv, odp_schedule. Wherever the application blocks or busy
>> waits waiting for more packets.
>>
> We do that already on odp_pktio_recv. It doesn't help, because you can
> only release the buffers held in the current interface's TX ring. You can't
> do anything about other interfaces.
>
Why not?

There is no guarantee that the application thread calling odp_pktio_recv()
on one interface is the only one transmitting on that specific egress
interface. In the general case, all threads may be using all pktio
interfaces for both reception and transmission.

I mean, you could trigger TX completion on every interface every time you
> receive on one, but that would be a scalability nightmare.

Maybe not every time. I expect a more creative solution than this. Perhaps
when you run out of buffers in the pool?



>
>
>
>>
>>
>>     Can you provide a vague draft how would you fix the l2fwd example
>> below?
>>
>> I don't think anything needs fixing on the application level.
>>
>
> Wrong. odp_l2fwd uses one packet pool, receives from pktio_src and then if
> there is anything received, it sends it out on pktio_dst.

This specific application has this specific behavior. Are you sure this is
a general solution? I am not.


> Let's say the pool has 576 elements, and the interfaces uses 256 RX and
> 256 TX descriptors. You start with 2*256 buffers kept in the two RX ring.
> Let's say you receive the first 64 packets, you refill the RX ring
> immediately, so now you're out of buffers. You can send out that 64, but in
> the next iteration odp_pktio_recv() will return 0 because it can't refill
> the RX descriptors. (and the driver won't give you back any buffer unless
> you can refill it). And now you are in an infinite loop, recv will always
> return 0, because you never release the packets.
>
The size of the pool should somehow be correlated with the size of the RX
and TX rings for "best performance" (whatever this means). But I also think
that the system should function regardless of RX/TX ring sizes and pool
size, "function" meaning not deadlock.

There are several ways to fix this:
> - tell the application writer that if you see deadlocks, increase the
> element size of the buffer. I doubt anyone would ever use ODP to anything
> serious when seeing such thing.
> - you can't really give anything more specific than in the previous point,
> because such details as RX/TX descriptor numbers are abstracted away,
> intentionally. And your platform can't autotune them, because it doesn't
> know how many elements you have in the pool used for TX. In fact, it could
> be more than just one pool.
> - make sure that you run odp_pktio_send even if pkts == 0. In case of
> ODP-DPDK it can help because that actually triggers TX completion.
> Actually, we can make odp_pktio_send_complete() == odp_pktio_send(len=0),
> so we don't have to introduce a new function. But that doesn't change the
> fact that we have to call TX completion periodically to make sure nothing
> is blocked.
>
So why doesn't the ODP-for-DPDK implementation call TX completion
"periodically" or at some other suitable times?


> - or we can just do what I proposed in the patch, which is very similar to
> the previous point, but articulate the importance of TX completion more.
>
Which is a platform specific problem and exactly the kind of things that
the ODP API should hide and not expose.


>
>
>
>>
>>         -- Ola
>>
>>
>>         On 28 May 2015 at 16:38, Zoltan Kiss <zoltan.k...@linaro.org
>>         <mailto:zoltan.k...@linaro.org>
>>         <mailto:zoltan.k...@linaro.org <mailto:zoltan.k...@linaro.org>>>
>>
>>         wrote:
>>
>>              A pktio interface can be used with poll mode drivers, where
>> TX
>>              completion often
>>              has to be done manually. This turned up as a problem with
>>         ODP-DPDK and
>>              odp_l2fwd:
>>
>>              while (!exit_threads) {
>>                       pkts = odp_pktio_recv(pktio_src,...);
>>                       if (pkts <= 0)
>>                               continue;
>>              ...
>>                       if (pkts_ok > 0)
>>                               odp_pktio_send(pktio_dst, pkt_tbl, pkts_ok);
>>              ...
>>              }
>>
>>              In this example we never call odp_pktio_send() on pktio_dst
>>         if there
>>              wasn't
>>              any new packets received on pktio_src. DPDK needs manual TX
>>              completion. The
>>              above example should have an
>>         odp_pktio_send_completion(pktio_dst)
>>              right at the
>>              beginning of the loop.
>>
>>              Signed-off-by: Zoltan Kiss <zoltan.k...@linaro.org
>>         <mailto:zoltan.k...@linaro.org>
>>              <mailto:zoltan.k...@linaro.org
>>
>>         <mailto:zoltan.k...@linaro.org>>>
>>
>>              ---
>>                include/odp/api/packet_io.h | 16 ++++++++++++++++
>>                1 file changed, 16 insertions(+)
>>
>>              diff --git a/include/odp/api/packet_io.h
>>         b/include/odp/api/packet_io.h
>>              index b97b2b8..3a4054c 100644
>>              --- a/include/odp/api/packet_io.h
>>              +++ b/include/odp/api/packet_io.h
>>              @@ -119,6 +119,22 @@ int odp_pktio_recv(odp_pktio_t pktio,
>>              odp_packet_t pkt_table[], int len);
>>                int odp_pktio_send(odp_pktio_t pktio, odp_packet_t
>>         pkt_table[],
>>              int len);
>>
>>                /**
>>              + * Release sent packets
>>              + *
>>              + * This function should be called after sending on a
>>         pktio. If the
>>              platform
>>              + * doesn't implement send completion in other ways, this
>>         function
>>              should call
>>              + * odp_packet_free() on packets where transmission is
>> already
>>              completed. It can
>>              + * be a no-op if the platform guarantees that the packets
>>         will be
>>              released upon
>>              + * completion, but the application must call it
>>         periodically after
>>              send to make
>>              + * sure packets are released.
>>              + *
>>              + * @param pktio        ODP packet IO handle
>>              + *
>>              + * @retval <0 on failure
>>              + */
>>              +int odp_pktio_send_complete(odp_pktio_t pktio);
>>              +
>>              +/**
>>                 * Set the default input queue to be associated with a
>>         pktio handle
>>                 *
>>                 * @param pktio                ODP packet IO handle
>>              --
>>              1.9.1
>>
>>              _______________________________________________
>>              lng-odp mailing list
>>         lng-odp@lists.linaro.org <mailto:lng-odp@lists.linaro.org>
>>         <mailto:lng-odp@lists.linaro.org <mailto:lng-odp@lists.linaro.org
>> >>
>>         https://lists.linaro.org/mailman/listinfo/lng-odp
>>
>>
>>
>>

_______________________________________________
lng-odp mailing list
lng-odp@lists.linaro.org
https://lists.linaro.org/mailman/listinfo/lng-odp

Re: [lng-odp] [API-NEXT PATCH] api-next: pktio: add odp_pktio_send_complete() definition

Reply via email to