Re: [lng-odp] 32b support in ODP-Cloud

Ola Liljedahl Wed, 29 Mar 2017 01:02:25 -0700

So there is a choice between
A) enabling static type checking in the compiler through strong typing
=> requires (syntactical) pointers i C => handles are 64-bit on 64-bit
systems
B) optimise for size and cache efficiency by using 32-bit (scalar) handles


Currently this choice is hard-wired into the ODP linux-generic implementation.

When profiling some ODP examples, I can see hot spots in the functions
that convert "pointer"-handles into the actual object pointers
(virtual addresses). So we are paying a double price here, handles are
large (increases cache pressure) and we have to translate handles to
address before we can reference the objects in the ODP calls.

On 29 March 2017 at 06:10, Bill Fischofer <bill.fischo...@linaro.org> wrote:
>
> On Tue, Mar 28, 2017 at 10:47 PM Honnappa Nagarahalli
> <honnappa.nagaraha...@linaro.org> wrote:
>>
>> On 28 March 2017 at 22:27, Bill Fischofer <bill.fischo...@linaro.org>
>> wrote:
>> >
>> >
>> > On Mon, Mar 27, 2017 at 10:11 PM, Honnappa Nagarahalli
>> > <honnappa.nagaraha...@linaro.org> wrote:
>> >>
>> >> On 27 March 2017 at 08:36, Ola Liljedahl <ola.liljed...@linaro.org>
>> >> wrote:
>> >> > On 27 March 2017 at 07:58, Honnappa Nagarahalli
>> >> > <honnappa.nagaraha...@linaro.org> wrote:
>> >> >> My answers inline. I was confused as hell just a month back :)
>> >> >>
>> >> >> On 23 March 2017 at 06:28, Francois Ozog <francois.o...@linaro.org>
>> >> >> wrote:
>> >> >>
>> >> >>> The more I dig the less I understand ;-)
>> >> >>>
>> >> >>> Let me ask a few questions:
>> >> >>>
>> >> >>> - in the future, when selling 32 bit silicon, which architecture
>> >> >>> version
>> >> >>> will it be ARMv7 or ARMv8 ?
>> >> > AFAIK, future 32-bit ARM cores (from ARM) will be ARMv8. But people
>> >> > are still building SoC's with e.g. ARM920 which is ARMv4T or
>> >> > something.
>> >> >
>> >> >>>
>> >> >>
>> >> >> What you are referring to is ISA version, not architecture. AArch32
>> >> >> and
>> >> >> AArch64 are architectures. ARMv8 also supports AArch32 (i.e. AArch32
>> >> >> with
>> >> >> ARMv8 ISA)
>> >> > ARMv8 has two architectural states, AArch32 and AArch64. An ARMv8
>> >> > implementation can implement either-or or both. There are already
>> >> > examples out there of all these different combinations.
>> >> >
>> >> > AAarch32 supports the A32 and T32 ISA's, these are closely related to
>> >> > (basically extensions of) the corresponding ARMv7a ARM and Thumb(-2)
>> >> > ISA's.
>> >> > The A32 (and T32?) ISA's have some of the ARMv8 extensions, e.g.
>> >> > load-acquire, store-release, crypto instructions, simplified WFE
>> >> > support etc.
>> >> > A user space ARMv7a image should run unmodified on ARMv8/AArch32, I
>> >> > don't know about other privilege levels but I can imagine an ARMv7a
>> >> > kernel running in AArch32 with an AArch64 hypervisor.
>> >> >
>> >> > AArch64 supports the A64 ISA. This ISA actually supports both 32-bit
>> >> > and 64-bit operations (although all addresses are 64-bit AFAIK).
>> >> > 32-bit operations use Wn registers and 64-bit operations use Xn
>> >> > registers. It's the same register set, Wn just denotes the lower 32
>> >> > bits.
>> >> >
>> >> >>
>> >> >> - is the target solution will be running ALL in 32 bits? (boot in 32
>> >> >> bits,
>> >> >>> Linux 32 bits, 32 bits apps)?
>> >> >>> - or is the target solution will be hybrid (64 bits Linux and some
>> >> >>> 32
>> >> >>> bits
>> >> >>> apps).
>> >> > I think this is the more likely path. If you have >= than 4GB of RAM
>> >> > (and also other stuff that needs physical addressing), you want a
>> >> > 64-bit kernel.
>> >> >
>> >> >>>
>> >> >>
>> >> >> The target solution could be Hybrid. Linux could be 64b, the
>> >> >> applications
>> >> >> could be 32b. It is my understanding that everything 32b is also
>> >> >> possible
>> >> >> using AArch32.
>> >> >>
>> >> >>
>> >> >>> When I read "AArch64 was designed to remove known implementation
>> >> >>> challenges of AArch32 cores" on http://infocenter.arm.com/
>> >> >>> help/index.jsp?topic=/com.arm.doc.dai0490a/ar01s01.html
>> >> >>> I wonder if stating we support AArch32 is a good idea...
>> >> >>>
>> >> >>> So what is the best way to describe what we want?
>> >> >>> -  ARMv8    LP64 or ILP32 ?
>> >> >>> - AArch64  LP64 or ILP32 ?
>> >> >>> - LP64 or ILP32?
>> >> >>>
>> >> >>> I think the best way to say is 'we support AArch64 and AArch32'.
>> >> > Re AArch64, LP64 or ILP32 applications?
>> >> >
>> >> > AArch32 or ARMv7a?
>> >> >
>> >> >>
>> >> >>
>> >> >>> FF
>> >> >>>
>> >> >>>
>> >> >>> On 23 March 2017 at 04:57, Honnappa Nagarahalli <
>> >> >>> honnappa.nagaraha...@linaro.org> wrote:
>> >> >>>
>> >> >>>> Hi Bill / Matt and others,
>> >> >>>>             What I was trying to say in our discussion is that,
>> >> >>>> the
>> >> >>>> ODP-Cloud code should not be pointer heavy.
>> >> >>>>
>> >> >>>> Please take a look at this video from BUD17:
>> >> >>>> http://connect.linaro.org/resource/bud17/bud17-101/ (unfortunately
>> >> >>>> there are no slides, I am trying to get them). This talks about
>> >> >>>> the
>> >> >>>> performance of the 32b application on AArch64. One of the
>> >> >>>> applications, has huge performance improvement while running in
>> >> >>>> 32b
>> >> >>>> mode (ILP32 in this particular case) on AArch64 (when compared to
>> >> >>>> the
>> >> >>>> same application compiled for 64b mode running on AArch64 i.e. in
>> >> >>>> 64b
>> >> >>>> compilation it performed very poorly). My understanding is that
>> >> >>>> this
>> >> >>>> particular application is a pointer chasing application. Other
>> >> >>>> applications which are not pointer heavy, do not have this
>> >> >>>> behavior.
>> >> > Isn't the problem with LP64 that if you have a lot of pointers stored
>> >> > in data structures, these take 2x the space of ILP32 pointers and
>> >> > thus
>> >> > increases the cache pressure.
>> >> >
>> >> > I don't think it is the pointer chasing itself that is penalised by
>> >> > 64-bit pointers. Pointer chasing apps are penalised by long
>> >> > load-to-use latencies (L1 cache hit latency, L2/L3 latencies, DRAM
>> >> > latency).
>> >> >
>> >> >>>>
>> >> >>>> So, we need to make sure ODP-Cloud is not pointer heavy and does
>> >> >>>> not
>> >> >>>> force the application to be pointer heavy, to get good performance
>> >> >>>> out
>> >> >>>> of 64b systems.
>> >> > Even with LP64, ODP could use 32-bit handles for ODP objects. The
>> >> > address lookup of the handle needs to be efficient (from a cache
>> >> > perspective) though, already now I can see hotspots in the function
>> >> > that returns an address from a handle.
>> >> >
>> >>
>> >> Yes, this is what I am trying to convey. If we have 32-bit handles, it
>> >> does not matter whether it is Aarch32 or Aarch64, the performance will
>> >> be optimized.
>> >
>> >
>> > The only way we've been able to achieve strong typing with ODP is if the
>> > handles are of size sizeof(void *). This isn't the case in AArch64, so I
>> > don't think this will hold. Obviously when ODP is compiled for AArch32
>> > pointers (and hence handles) are 32-bits.
>> >
>> I did not understand your comment on strong typing. Can you elaborate
>> or provide an example?
>> If the handles need to be 64b (i.e. even on a 32b system they are
>> 64b), then we should keep them as 64b. Otherwise, performance should
>> be given higher priority.
>
>
> Look at the ODP strong type files in the plat directory. We achieve strong
> typing by defining handles to be pointers to structs, which C treats as
> different types. There doesn't appear to be any other way to achieve this
> since C typedefs are weakly typed.
>>
>>
>>
>> >>
>> >>
>> >> >>>>
>> >> >>>> Thank you,
>> >> >>>> Honnappa
>> >> >>>>
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>> --
>> >> >>> [image: Linaro] <http://www.linaro.org/>
>> >> >>> François-Frédéric Ozog | *Director Linaro Networking Group*
>> >> >>> T: +33.67221.6485
>> >> >>> francois.o...@linaro.org | Skype: ffozog
>> >> >>>
>> >> >>>
>> >
>> >

Re: [lng-odp] 32b support in ODP-Cloud

Reply via email to