On 7 December 2017 at 17:36, Bill Fischofer <bill.fischo...@linaro.org> wrote:
>
>
> On Thu, Dec 7, 2017 at 3:17 PM, Honnappa Nagarahalli
> <honnappa.nagaraha...@linaro.org> wrote:
>>
>> This experiment clearly shows the need for providing an API in ODP.
>>
>> On ODP2.0 implementations such an API will be simple enough (constant
>> subtraction), requiring no additional storage in VLIB.
>>
>> Michal, can you send a PR to ODP for the API so that we can debate the
>> feasibility of the API for Cavium/NXP platforms.
>
>
> That's the point. An API that is tailored to a specific implementation or
> application is not what ODP is about.
>
How are the requirements coming to ODP APIs currently? My
understanding is, it is coming from OFP and Petri's requirements.
Similarly, VPP is also an application of ODP. Recently, Arm community
(Arm and partners) prioritized on the open source projects that are of
importance and came up with top 50 (or 100) projects. If I remember
correct VPP is among top single digits (I am trying to get the exact
details). So, it is an application of significant interest.

>>
>>
>> On 7 December 2017 at 14:08, Bill Fischofer <bill.fischo...@linaro.org>
>> wrote:
>> > On Thu, Dec 7, 2017 at 12:22 PM, Michal Mazur <michal.ma...@linaro.org>
>> > wrote:
>> >
>> >> Native VPP+DPDK plugin knows the size of rte_mbuf header and subtracts
>> >> it
>> >> from the vlib pointer.
>> >>
>> >> struct rte_mbuf *mb0 = rte_mbuf_from_vlib_buffer (b0);
>> >> #define rte_mbuf_from_vlib_buffer(x) (((struct rte_mbuf *)x) - 1)
>> >>
>> >
>> > No surprise that VPP is a DPDK application, but I thought they wanted to
>> > be
>> > independent of DPDK. The problem is that ODP is never going to match
>> > DPDK
>> > at an ABI level on x86 so we can't be fixated on x86 performance
>> > comparisons between ODP4VPP and VPP/DPDK.
>> Any reason why we will not be able to match or exceed the performance?
>
>
> It's not that ODP can't have good performance on x86, it's that DPDK
> encourages apps to be very dependent on DPDK implementation details such as
> seen here. ODP is not going to match DPDK internals so applications that
> exploit such internals will always see a difference.
>
>>
>>
>> What we need to do is compare
>> > ODP4VPP on Arm-based SoCs vs. "native VPP" that can't take advantage of
>> > the
>> > HW acceleration present on those platforms. That's how we get to show
>> > dramatic differences. If ODP4VPP is only within a few percent (plus or
>> > minus) of VPP/DPDK there's no point of doing the project at all.
>> >
>> > So my advice would be to stash the handle in the VLIB buffer for now and
>> > focus on exploiting the native IPsec acceleration capabilities that ODP
>> > will permit.
>> >
>> >
>> >> On 7 December 2017 at 19:02, Bill Fischofer <bill.fischo...@linaro.org>
>> >> wrote:
>> >>
>> >>> Ping to others on the mailing list for opinions on this. What does
>> >>> "native" VPP+DPDK get and how is this problem solved there?
>> >>>
>> >>> On Thu, Dec 7, 2017 at 11:55 AM, Michal Mazur
>> >>> <michal.ma...@linaro.org>
>> >>> wrote:
>> >>>
>> >>>> The _odp_packet_inline is common for all packets and takes up to two
>> >>>> cachelines (it contains only offsets). Reading pointer for each
>> >>>> packet from
>> >>>> VLIB would require to fetch 10 million cachelines per second.
>> >>>> Using prefetches does not help.
>> >>>>
>> >>>> On 7 December 2017 at 18:37, Bill Fischofer
>> >>>> <bill.fischo...@linaro.org>
>> >>>> wrote:
>> >>>>
>> >>>>> Yes, but _odp_packet_inline.udate is clearly not in the VLIB cache
>> >>>>> line
>> >>>>> either, so it's a separate cache line access. Are you seeing this
>> >>>>> difference in real runs or microbenchmarks? Why isn't the entire
>> >>>>> VLIB being
>> >>>>> prefetched at dispatch? Sequential prefetching should add negligible
>> >>>>> overhead.
>> >>>>>
>> >>>>> On Thu, Dec 7, 2017 at 11:13 AM, Michal Mazur
>> >>>>> <michal.ma...@linaro.org>
>> >>>>> wrote:
>> >>>>>
>> >>>>>> It seems that only first cache line of VLIB buffer is in L1, new
>> >>>>>> pointer can be placed only in second cacheline.
>> >>>>>> Using constant offset between user area and ODP header i get 11
>> >>>>>> Mpps,
>> >>>>>> with pointer stored in VLIB buffer only 10Mpps and with this new
>> >>>>>> api
>> >>>>>> 10.6Mpps.
>> >>>>>>
>> >>>>>> On 7 December 2017 at 18:04, Bill Fischofer
>> >>>>>> <bill.fischo...@linaro.org
>> >>>>>> > wrote:
>> >>>>>>
>> >>>>>>> How would calling an API be better than referencing the stored
>> >>>>>>> data
>> >>>>>>> yourself? A cache line reference is a cache line reference, and
>> >>>>>>> presumably
>> >>>>>>> the VLIB buffer is already in L1 since it's your active data.
>> >>>>>>>
>> >>>>>>> On Thu, Dec 7, 2017 at 10:45 AM, Michal Mazur <
>> >>>>>>> michal.ma...@linaro.org> wrote:
>> >>>>>>>
>> >>>>>>>> Hi,
>> >>>>>>>>
>> >>>>>>>> For odp4vpp plugin we need a new API function which, given user
>> >>>>>>>> area
>> >>>>>>>> pointer, will return a pointer to ODP packet buffer. It is needed
>> >>>>>>>> when
>> >>>>>>>> packets processed by VPP are sent back to ODP and only a pointer
>> >>>>>>>> to
>> >>>>>>>> VLIB
>> >>>>>>>> buffer data (stored inside user area) is known.
>> >>>>>>>>
>> >>>>>>>> I have tried to store the ODP buffer pointer in VLIB data but
>> >>>>>>>> reading it
>> >>>>>>>> for every packet lowers performance by 800kpps.
>> >>>>>>>>
>> >>>>>>>> For odp-dpdk implementation it can look like:
>> >>>>>>>> /** @internal Inline function @param uarea @return */
>> >>>>>>>> static inline odp_packet_t _odp_packet_from_user_area(void
>> >>>>>>>> *uarea)
>> >>>>>>>> {
>> >>>>>>>>        return (odp_packet_t)((uintptr_t)uarea -
>> >>>>>>>> _odp_packet_inline.udata);
>> >>>>>>>> }
>> >>>>>>>>
>> >>>>>>>> Please let me know what you think.
>> >>>>>>>>
>> >>>>>>>> Thanks,
>> >>>>>>>> Michal
>> >>>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>
>> >>>>>
>> >>>>
>> >>>
>> >>
>
>

Reply via email to