Re: [lng-odp] New API to convert user area ptr to odp_packet_t

Bill Fischofer Thu, 07 Dec 2017 12:08:56 -0800

On Thu, Dec 7, 2017 at 12:22 PM, Michal Mazur <michal.ma...@linaro.org>
wrote:


> Native VPP+DPDK plugin knows the size of rte_mbuf header and subtracts it
> from the vlib pointer.
>
> struct rte_mbuf *mb0 = rte_mbuf_from_vlib_buffer (b0);
> #define rte_mbuf_from_vlib_buffer(x) (((struct rte_mbuf *)x) - 1)
>

No surprise that VPP is a DPDK application, but I thought they wanted to be
independent of DPDK. The problem is that ODP is never going to match DPDK
at an ABI level on x86 so we can't be fixated on x86 performance
comparisons between ODP4VPP and VPP/DPDK. What we need to do is compare
ODP4VPP on Arm-based SoCs vs. "native VPP" that can't take advantage of the
HW acceleration present on those platforms. That's how we get to show
dramatic differences. If ODP4VPP is only within a few percent (plus or
minus) of VPP/DPDK there's no point of doing the project at all.

So my advice would be to stash the handle in the VLIB buffer for now and
focus on exploiting the native IPsec acceleration capabilities that ODP
will permit.


> On 7 December 2017 at 19:02, Bill Fischofer <bill.fischo...@linaro.org>
> wrote:
>
>> Ping to others on the mailing list for opinions on this. What does
>> "native" VPP+DPDK get and how is this problem solved there?
>>
>> On Thu, Dec 7, 2017 at 11:55 AM, Michal Mazur <michal.ma...@linaro.org>
>> wrote:
>>
>>> The _odp_packet_inline is common for all packets and takes up to two
>>> cachelines (it contains only offsets). Reading pointer for each packet from
>>> VLIB would require to fetch 10 million cachelines per second.
>>> Using prefetches does not help.
>>>
>>> On 7 December 2017 at 18:37, Bill Fischofer <bill.fischo...@linaro.org>
>>> wrote:
>>>
>>>> Yes, but _odp_packet_inline.udate is clearly not in the VLIB cache line
>>>> either, so it's a separate cache line access. Are you seeing this
>>>> difference in real runs or microbenchmarks? Why isn't the entire VLIB being
>>>> prefetched at dispatch? Sequential prefetching should add negligible
>>>> overhead.
>>>>
>>>> On Thu, Dec 7, 2017 at 11:13 AM, Michal Mazur <michal.ma...@linaro.org>
>>>> wrote:
>>>>
>>>>> It seems that only first cache line of VLIB buffer is in L1, new
>>>>> pointer can be placed only in second cacheline.
>>>>> Using constant offset between user area and ODP header i get 11 Mpps,
>>>>> with pointer stored in VLIB buffer only 10Mpps and with this new api
>>>>> 10.6Mpps.
>>>>>
>>>>> On 7 December 2017 at 18:04, Bill Fischofer <bill.fischo...@linaro.org
>>>>> > wrote:
>>>>>
>>>>>> How would calling an API be better than referencing the stored data
>>>>>> yourself? A cache line reference is a cache line reference, and 
>>>>>> presumably
>>>>>> the VLIB buffer is already in L1 since it's your active data.
>>>>>>
>>>>>> On Thu, Dec 7, 2017 at 10:45 AM, Michal Mazur <
>>>>>> michal.ma...@linaro.org> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> For odp4vpp plugin we need a new API function which, given user area
>>>>>>> pointer, will return a pointer to ODP packet buffer. It is needed
>>>>>>> when
>>>>>>> packets processed by VPP are sent back to ODP and only a pointer to
>>>>>>> VLIB
>>>>>>> buffer data (stored inside user area) is known.
>>>>>>>
>>>>>>> I have tried to store the ODP buffer pointer in VLIB data but
>>>>>>> reading it
>>>>>>> for every packet lowers performance by 800kpps.
>>>>>>>
>>>>>>> For odp-dpdk implementation it can look like:
>>>>>>> /** @internal Inline function @param uarea @return */
>>>>>>> static inline odp_packet_t _odp_packet_from_user_area(void *uarea)
>>>>>>> {
>>>>>>>        return (odp_packet_t)((uintptr_t)uarea -
>>>>>>> _odp_packet_inline.udata);
>>>>>>> }
>>>>>>>
>>>>>>> Please let me know what you think.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Michal
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: [lng-odp] New API to convert user area ptr to odp_packet_t

Reply via email to