Re: [lng-odp] 32b support in ODP-Cloud

Bill Fischofer Wed, 29 Mar 2017 05:55:53 -0700

On Wed, Mar 29, 2017 at 7:29 AM, Ola Liljedahl <ola.liljed...@linaro.org>
wrote:


>
>
> On 29 March 2017 at 13:25, Bill Fischofer <bill.fischo...@linaro.org>
> wrote:
>
>>
>>
>> On Wed, Mar 29, 2017 at 5:47 AM, Ola Liljedahl <ola.liljed...@linaro.org>
>> wrote:
>>
>>> On 29 March 2017 at 10:43, Francois Ozog <francois.o...@linaro.org>
>>> wrote:
>>>
>>>> If there is a cost to get virtual address, then I assume translation is
>>>> NOT just casting: correct?
>>>>
>>> Correct. linux-generic has a number of dereferences in the code that
>>> returns e.g. the buffer address from a buffer handle. This is not optimised
>>> for performance. The design does provide the ability to check buffer
>>> handles for correctness/validity but I cannot see any code that actually
>>> does this so an invalid buffer handle might crash the code (some out of
>>> bounds memory access).
>>>
>>> I suspect that the hot spots are due to the fact that in many cases we
>> are only using a 32-bit value and wrapping it in a 64-bit handle. This was
>> originally done to make the strong typing 32/64 bit agnostic.  But this can
>> change if we widen the linux-generic handles to use the full pointer width.
>> Ola: you should no longer be seeing those hot spots in the packet code
>> since with the more recent changes Petri introduced the odp_packet_t is now
>> simply a pointer to an odp_packet_hdr_t, similar to how in odp-dpdk it is a
>> pointer to an rte_mbuf.
>>
> This is/was a benchmark (odp_sched_latency) that uses buffers. But it is
> good that we now use pointers for packet handles. Perhaps we can do the
> same thing for buffers and other event types?
>

The changes are pretty straightforward. I'll look into posting a similar
patch for buffers to match the packet changes. One of the advantages of
using abstract types is how easy it really is to change their internals and
just update a couple of conversion routines.


>
>
>> Certainly in odp-cloud we should do a similar mapping for other key
>> handle types.
>>
>> Another approach requires a bit more config/tools technology would be to
>> support multiple type definitions as a performance tuning option. When
>> developing you'd compile using an include structure that has handles
>> defined as pointers to structs to get the strong typing and then support a
>> compile option for production use that redefines the handles to be
>> uint32_t. That would reduce their footprint to 32-bits but would lose
>> strong type checking, however that's a trade-off an application writer
>> could decide is worth while.
>>
>> Currently we support a -DDEBUG option that includes additional runtime
>> checking. We could do this via a similar -DTYPE_CHECK option (default) and
>> support a -DNO_TYPE_CHECK for the "compact" handles.
>>
>>
>>>
>>>> FF
>>>>
>>>> On 29 March 2017 at 10:00, Ola Liljedahl <ola.liljed...@linaro.org>
>>>> wrote:
>>>>
>>>>> So there is a choice between
>>>>> A) enabling static type checking in the compiler through strong typing
>>>>> => requires (syntactical) pointers i C => handles are 64-bit on 64-bit
>>>>> systems
>>>>> B) optimise for size and cache efficiency by using 32-bit (scalar)
>>>>> handles
>>>>>
>>>>> Currently this choice is hard-wired into the ODP linux-generic
>>>>> implementation.
>>>>>
>>>>> When profiling some ODP examples, I can see hot spots in the functions
>>>>> that convert "pointer"-handles into the actual object pointers
>>>>> (virtual addresses). So we are paying a double price here, handles are
>>>>> large (increases cache pressure) and we have to translate handles to
>>>>> address before we can reference the objects in the ODP calls.
>>>>>
>>>>> On 29 March 2017 at 06:10, Bill Fischofer <bill.fischo...@linaro.org>
>>>>> wrote:
>>>>> >
>>>>> > On Tue, Mar 28, 2017 at 10:47 PM Honnappa Nagarahalli
>>>>> > <honnappa.nagaraha...@linaro.org> wrote:
>>>>> >>
>>>>> >> On 28 March 2017 at 22:27, Bill Fischofer <
>>>>> bill.fischo...@linaro.org>
>>>>> >> wrote:
>>>>> >> >
>>>>> >> >
>>>>> >> > On Mon, Mar 27, 2017 at 10:11 PM, Honnappa Nagarahalli
>>>>> >> > <honnappa.nagaraha...@linaro.org> wrote:
>>>>> >> >>
>>>>> >> >> On 27 March 2017 at 08:36, Ola Liljedahl <
>>>>> ola.liljed...@linaro.org>
>>>>> >> >> wrote:
>>>>> >> >> > On 27 March 2017 at 07:58, Honnappa Nagarahalli
>>>>> >> >> > <honnappa.nagaraha...@linaro.org> wrote:
>>>>> >> >> >> My answers inline. I was confused as hell just a month back :)
>>>>> >> >> >>
>>>>> >> >> >> On 23 March 2017 at 06:28, Francois Ozog <
>>>>> francois.o...@linaro.org>
>>>>> >> >> >> wrote:
>>>>> >> >> >>
>>>>> >> >> >>> The more I dig the less I understand ;-)
>>>>> >> >> >>>
>>>>> >> >> >>> Let me ask a few questions:
>>>>> >> >> >>>
>>>>> >> >> >>> - in the future, when selling 32 bit silicon, which
>>>>> architecture
>>>>> >> >> >>> version
>>>>> >> >> >>> will it be ARMv7 or ARMv8 ?
>>>>> >> >> > AFAIK, future 32-bit ARM cores (from ARM) will be ARMv8. But
>>>>> people
>>>>> >> >> > are still building SoC's with e.g. ARM920 which is ARMv4T or
>>>>> >> >> > something.
>>>>> >> >> >
>>>>> >> >> >>>
>>>>> >> >> >>
>>>>> >> >> >> What you are referring to is ISA version, not architecture.
>>>>> AArch32
>>>>> >> >> >> and
>>>>> >> >> >> AArch64 are architectures. ARMv8 also supports AArch32 (i.e.
>>>>> AArch32
>>>>> >> >> >> with
>>>>> >> >> >> ARMv8 ISA)
>>>>> >> >> > ARMv8 has two architectural states, AArch32 and AArch64. An
>>>>> ARMv8
>>>>> >> >> > implementation can implement either-or or both. There are
>>>>> already
>>>>> >> >> > examples out there of all these different combinations.
>>>>> >> >> >
>>>>> >> >> > AAarch32 supports the A32 and T32 ISA's, these are closely
>>>>> related to
>>>>> >> >> > (basically extensions of) the corresponding ARMv7a ARM and
>>>>> Thumb(-2)
>>>>> >> >> > ISA's.
>>>>> >> >> > The A32 (and T32?) ISA's have some of the ARMv8 extensions,
>>>>> e.g.
>>>>> >> >> > load-acquire, store-release, crypto instructions, simplified
>>>>> WFE
>>>>> >> >> > support etc.
>>>>> >> >> > A user space ARMv7a image should run unmodified on
>>>>> ARMv8/AArch32, I
>>>>> >> >> > don't know about other privilege levels but I can imagine an
>>>>> ARMv7a
>>>>> >> >> > kernel running in AArch32 with an AArch64 hypervisor.
>>>>> >> >> >
>>>>> >> >> > AArch64 supports the A64 ISA. This ISA actually supports both
>>>>> 32-bit
>>>>> >> >> > and 64-bit operations (although all addresses are 64-bit
>>>>> AFAIK).
>>>>> >> >> > 32-bit operations use Wn registers and 64-bit operations use Xn
>>>>> >> >> > registers. It's the same register set, Wn just denotes the
>>>>> lower 32
>>>>> >> >> > bits.
>>>>> >> >> >
>>>>> >> >> >>
>>>>> >> >> >> - is the target solution will be running ALL in 32 bits?
>>>>> (boot in 32
>>>>> >> >> >> bits,
>>>>> >> >> >>> Linux 32 bits, 32 bits apps)?
>>>>> >> >> >>> - or is the target solution will be hybrid (64 bits Linux
>>>>> and some
>>>>> >> >> >>> 32
>>>>> >> >> >>> bits
>>>>> >> >> >>> apps).
>>>>> >> >> > I think this is the more likely path. If you have >= than 4GB
>>>>> of RAM
>>>>> >> >> > (and also other stuff that needs physical addressing), you
>>>>> want a
>>>>> >> >> > 64-bit kernel.
>>>>> >> >> >
>>>>> >> >> >>>
>>>>> >> >> >>
>>>>> >> >> >> The target solution could be Hybrid. Linux could be 64b, the
>>>>> >> >> >> applications
>>>>> >> >> >> could be 32b. It is my understanding that everything 32b is
>>>>> also
>>>>> >> >> >> possible
>>>>> >> >> >> using AArch32.
>>>>> >> >> >>
>>>>> >> >> >>
>>>>> >> >> >>> When I read "AArch64 was designed to remove known
>>>>> implementation
>>>>> >> >> >>> challenges of AArch32 cores" on http://infocenter.arm.com/
>>>>> >> >> >>> help/index.jsp?topic=/com.arm.doc.dai0490a/ar01s01.html
>>>>> >> >> >>> I wonder if stating we support AArch32 is a good idea...
>>>>> >> >> >>>
>>>>> >> >> >>> So what is the best way to describe what we want?
>>>>> >> >> >>> -  ARMv8    LP64 or ILP32 ?
>>>>> >> >> >>> - AArch64  LP64 or ILP32 ?
>>>>> >> >> >>> - LP64 or ILP32?
>>>>> >> >> >>>
>>>>> >> >> >>> I think the best way to say is 'we support AArch64 and
>>>>> AArch32'.
>>>>> >> >> > Re AArch64, LP64 or ILP32 applications?
>>>>> >> >> >
>>>>> >> >> > AArch32 or ARMv7a?
>>>>> >> >> >
>>>>> >> >> >>
>>>>> >> >> >>
>>>>> >> >> >>> FF
>>>>> >> >> >>>
>>>>> >> >> >>>
>>>>> >> >> >>> On 23 March 2017 at 04:57, Honnappa Nagarahalli <
>>>>> >> >> >>> honnappa.nagaraha...@linaro.org> wrote:
>>>>> >> >> >>>
>>>>> >> >> >>>> Hi Bill / Matt and others,
>>>>> >> >> >>>>             What I was trying to say in our discussion is
>>>>> that,
>>>>> >> >> >>>> the
>>>>> >> >> >>>> ODP-Cloud code should not be pointer heavy.
>>>>> >> >> >>>>
>>>>> >> >> >>>> Please take a look at this video from BUD17:
>>>>> >> >> >>>> http://connect.linaro.org/resource/bud17/bud17-101/
>>>>> (unfortunately
>>>>> >> >> >>>> there are no slides, I am trying to get them). This talks
>>>>> about
>>>>> >> >> >>>> the
>>>>> >> >> >>>> performance of the 32b application on AArch64. One of the
>>>>> >> >> >>>> applications, has huge performance improvement while
>>>>> running in
>>>>> >> >> >>>> 32b
>>>>> >> >> >>>> mode (ILP32 in this particular case) on AArch64 (when
>>>>> compared to
>>>>> >> >> >>>> the
>>>>> >> >> >>>> same application compiled for 64b mode running on AArch64
>>>>> i.e. in
>>>>> >> >> >>>> 64b
>>>>> >> >> >>>> compilation it performed very poorly). My understanding is
>>>>> that
>>>>> >> >> >>>> this
>>>>> >> >> >>>> particular application is a pointer chasing application.
>>>>> Other
>>>>> >> >> >>>> applications which are not pointer heavy, do not have this
>>>>> >> >> >>>> behavior.
>>>>> >> >> > Isn't the problem with LP64 that if you have a lot of pointers
>>>>> stored
>>>>> >> >> > in data structures, these take 2x the space of ILP32 pointers
>>>>> and
>>>>> >> >> > thus
>>>>> >> >> > increases the cache pressure.
>>>>> >> >> >
>>>>> >> >> > I don't think it is the pointer chasing itself that is
>>>>> penalised by
>>>>> >> >> > 64-bit pointers. Pointer chasing apps are penalised by long
>>>>> >> >> > load-to-use latencies (L1 cache hit latency, L2/L3 latencies,
>>>>> DRAM
>>>>> >> >> > latency).
>>>>> >> >> >
>>>>> >> >> >>>>
>>>>> >> >> >>>> So, we need to make sure ODP-Cloud is not pointer heavy and
>>>>> does
>>>>> >> >> >>>> not
>>>>> >> >> >>>> force the application to be pointer heavy, to get good
>>>>> performance
>>>>> >> >> >>>> out
>>>>> >> >> >>>> of 64b systems.
>>>>> >> >> > Even with LP64, ODP could use 32-bit handles for ODP objects.
>>>>> The
>>>>> >> >> > address lookup of the handle needs to be efficient (from a
>>>>> cache
>>>>> >> >> > perspective) though, already now I can see hotspots in the
>>>>> function
>>>>> >> >> > that returns an address from a handle.
>>>>> >> >> >
>>>>> >> >>
>>>>> >> >> Yes, this is what I am trying to convey. If we have 32-bit
>>>>> handles, it
>>>>> >> >> does not matter whether it is Aarch32 or Aarch64, the
>>>>> performance will
>>>>> >> >> be optimized.
>>>>> >> >
>>>>> >> >
>>>>> >> > The only way we've been able to achieve strong typing with ODP is
>>>>> if the
>>>>> >> > handles are of size sizeof(void *). This isn't the case in
>>>>> AArch64, so I
>>>>> >> > don't think this will hold. Obviously when ODP is compiled for
>>>>> AArch32
>>>>> >> > pointers (and hence handles) are 32-bits.
>>>>> >> >
>>>>> >> I did not understand your comment on strong typing. Can you
>>>>> elaborate
>>>>> >> or provide an example?
>>>>> >> If the handles need to be 64b (i.e. even on a 32b system they are
>>>>> >> 64b), then we should keep them as 64b. Otherwise, performance should
>>>>> >> be given higher priority.
>>>>> >
>>>>> >
>>>>> > Look at the ODP strong type files in the plat directory. We achieve
>>>>> strong
>>>>> > typing by defining handles to be pointers to structs, which C treats
>>>>> as
>>>>> > different types. There doesn't appear to be any other way to achieve
>>>>> this
>>>>> > since C typedefs are weakly typed.
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> >>
>>>>> >> >>
>>>>> >> >> >>>>
>>>>> >> >> >>>> Thank you,
>>>>> >> >> >>>> Honnappa
>>>>> >> >> >>>>
>>>>> >> >> >>>
>>>>> >> >> >>>
>>>>> >> >> >>>
>>>>> >> >> >>> --
>>>>> >> >> >>> [image: Linaro] <http://www.linaro.org/>
>>>>> >> >> >>> François-Frédéric Ozog | *Director Linaro Networking Group*
>>>>> >> >> >>> T: +33.67221.6485
>>>>> >> >> >>> francois.o...@linaro.org | Skype: ffozog
>>>>> >> >> >>>
>>>>> >> >> >>>
>>>>> >> >
>>>>> >> >
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> [image: Linaro] <http://www.linaro.org/>
>>>> François-Frédéric Ozog | *Director Linaro Networking Group*
>>>> T: +33.67221.6485
>>>> francois.o...@linaro.org | Skype: ffozog
>>>>
>>>>
>>>
>>
>

Re: [lng-odp] 32b support in ODP-Cloud

Reply via email to