Re: [lng-odp] 32b support in ODP-Cloud

Ola Liljedahl Wed, 29 Mar 2017 03:48:40 -0700

On 29 March 2017 at 10:43, Francois Ozog <francois.o...@linaro.org> wrote:


> If there is a cost to get virtual address, then I assume translation is
> NOT just casting: correct?
>
Correct. linux-generic has a number of dereferences in the code that
returns e.g. the buffer address from a buffer handle. This is not optimised
for performance. The design does provide the ability to check buffer
handles for correctness/validity but I cannot see any code that actually
does this so an invalid buffer handle might crash the code (some out of
bounds memory access).


> FF
>
> On 29 March 2017 at 10:00, Ola Liljedahl <ola.liljed...@linaro.org> wrote:
>
>> So there is a choice between
>> A) enabling static type checking in the compiler through strong typing
>> => requires (syntactical) pointers i C => handles are 64-bit on 64-bit
>> systems
>> B) optimise for size and cache efficiency by using 32-bit (scalar) handles
>>
>> Currently this choice is hard-wired into the ODP linux-generic
>> implementation.
>>
>> When profiling some ODP examples, I can see hot spots in the functions
>> that convert "pointer"-handles into the actual object pointers
>> (virtual addresses). So we are paying a double price here, handles are
>> large (increases cache pressure) and we have to translate handles to
>> address before we can reference the objects in the ODP calls.
>>
>> On 29 March 2017 at 06:10, Bill Fischofer <bill.fischo...@linaro.org>
>> wrote:
>> >
>> > On Tue, Mar 28, 2017 at 10:47 PM Honnappa Nagarahalli
>> > <honnappa.nagaraha...@linaro.org> wrote:
>> >>
>> >> On 28 March 2017 at 22:27, Bill Fischofer <bill.fischo...@linaro.org>
>> >> wrote:
>> >> >
>> >> >
>> >> > On Mon, Mar 27, 2017 at 10:11 PM, Honnappa Nagarahalli
>> >> > <honnappa.nagaraha...@linaro.org> wrote:
>> >> >>
>> >> >> On 27 March 2017 at 08:36, Ola Liljedahl <ola.liljed...@linaro.org>
>> >> >> wrote:
>> >> >> > On 27 March 2017 at 07:58, Honnappa Nagarahalli
>> >> >> > <honnappa.nagaraha...@linaro.org> wrote:
>> >> >> >> My answers inline. I was confused as hell just a month back :)
>> >> >> >>
>> >> >> >> On 23 March 2017 at 06:28, Francois Ozog <
>> francois.o...@linaro.org>
>> >> >> >> wrote:
>> >> >> >>
>> >> >> >>> The more I dig the less I understand ;-)
>> >> >> >>>
>> >> >> >>> Let me ask a few questions:
>> >> >> >>>
>> >> >> >>> - in the future, when selling 32 bit silicon, which architecture
>> >> >> >>> version
>> >> >> >>> will it be ARMv7 or ARMv8 ?
>> >> >> > AFAIK, future 32-bit ARM cores (from ARM) will be ARMv8. But
>> people
>> >> >> > are still building SoC's with e.g. ARM920 which is ARMv4T or
>> >> >> > something.
>> >> >> >
>> >> >> >>>
>> >> >> >>
>> >> >> >> What you are referring to is ISA version, not architecture.
>> AArch32
>> >> >> >> and
>> >> >> >> AArch64 are architectures. ARMv8 also supports AArch32 (i.e.
>> AArch32
>> >> >> >> with
>> >> >> >> ARMv8 ISA)
>> >> >> > ARMv8 has two architectural states, AArch32 and AArch64. An ARMv8
>> >> >> > implementation can implement either-or or both. There are already
>> >> >> > examples out there of all these different combinations.
>> >> >> >
>> >> >> > AAarch32 supports the A32 and T32 ISA's, these are closely
>> related to
>> >> >> > (basically extensions of) the corresponding ARMv7a ARM and
>> Thumb(-2)
>> >> >> > ISA's.
>> >> >> > The A32 (and T32?) ISA's have some of the ARMv8 extensions, e.g.
>> >> >> > load-acquire, store-release, crypto instructions, simplified WFE
>> >> >> > support etc.
>> >> >> > A user space ARMv7a image should run unmodified on ARMv8/AArch32,
>> I
>> >> >> > don't know about other privilege levels but I can imagine an
>> ARMv7a
>> >> >> > kernel running in AArch32 with an AArch64 hypervisor.
>> >> >> >
>> >> >> > AArch64 supports the A64 ISA. This ISA actually supports both
>> 32-bit
>> >> >> > and 64-bit operations (although all addresses are 64-bit AFAIK).
>> >> >> > 32-bit operations use Wn registers and 64-bit operations use Xn
>> >> >> > registers. It's the same register set, Wn just denotes the lower
>> 32
>> >> >> > bits.
>> >> >> >
>> >> >> >>
>> >> >> >> - is the target solution will be running ALL in 32 bits? (boot
>> in 32
>> >> >> >> bits,
>> >> >> >>> Linux 32 bits, 32 bits apps)?
>> >> >> >>> - or is the target solution will be hybrid (64 bits Linux and
>> some
>> >> >> >>> 32
>> >> >> >>> bits
>> >> >> >>> apps).
>> >> >> > I think this is the more likely path. If you have >= than 4GB of
>> RAM
>> >> >> > (and also other stuff that needs physical addressing), you want a
>> >> >> > 64-bit kernel.
>> >> >> >
>> >> >> >>>
>> >> >> >>
>> >> >> >> The target solution could be Hybrid. Linux could be 64b, the
>> >> >> >> applications
>> >> >> >> could be 32b. It is my understanding that everything 32b is also
>> >> >> >> possible
>> >> >> >> using AArch32.
>> >> >> >>
>> >> >> >>
>> >> >> >>> When I read "AArch64 was designed to remove known implementation
>> >> >> >>> challenges of AArch32 cores" on http://infocenter.arm.com/
>> >> >> >>> help/index.jsp?topic=/com.arm.doc.dai0490a/ar01s01.html
>> >> >> >>> I wonder if stating we support AArch32 is a good idea...
>> >> >> >>>
>> >> >> >>> So what is the best way to describe what we want?
>> >> >> >>> -  ARMv8    LP64 or ILP32 ?
>> >> >> >>> - AArch64  LP64 or ILP32 ?
>> >> >> >>> - LP64 or ILP32?
>> >> >> >>>
>> >> >> >>> I think the best way to say is 'we support AArch64 and AArch32'.
>> >> >> > Re AArch64, LP64 or ILP32 applications?
>> >> >> >
>> >> >> > AArch32 or ARMv7a?
>> >> >> >
>> >> >> >>
>> >> >> >>
>> >> >> >>> FF
>> >> >> >>>
>> >> >> >>>
>> >> >> >>> On 23 March 2017 at 04:57, Honnappa Nagarahalli <
>> >> >> >>> honnappa.nagaraha...@linaro.org> wrote:
>> >> >> >>>
>> >> >> >>>> Hi Bill / Matt and others,
>> >> >> >>>>             What I was trying to say in our discussion is that,
>> >> >> >>>> the
>> >> >> >>>> ODP-Cloud code should not be pointer heavy.
>> >> >> >>>>
>> >> >> >>>> Please take a look at this video from BUD17:
>> >> >> >>>> http://connect.linaro.org/resource/bud17/bud17-101/
>> (unfortunately
>> >> >> >>>> there are no slides, I am trying to get them). This talks about
>> >> >> >>>> the
>> >> >> >>>> performance of the 32b application on AArch64. One of the
>> >> >> >>>> applications, has huge performance improvement while running in
>> >> >> >>>> 32b
>> >> >> >>>> mode (ILP32 in this particular case) on AArch64 (when compared
>> to
>> >> >> >>>> the
>> >> >> >>>> same application compiled for 64b mode running on AArch64 i.e.
>> in
>> >> >> >>>> 64b
>> >> >> >>>> compilation it performed very poorly). My understanding is that
>> >> >> >>>> this
>> >> >> >>>> particular application is a pointer chasing application. Other
>> >> >> >>>> applications which are not pointer heavy, do not have this
>> >> >> >>>> behavior.
>> >> >> > Isn't the problem with LP64 that if you have a lot of pointers
>> stored
>> >> >> > in data structures, these take 2x the space of ILP32 pointers and
>> >> >> > thus
>> >> >> > increases the cache pressure.
>> >> >> >
>> >> >> > I don't think it is the pointer chasing itself that is penalised
>> by
>> >> >> > 64-bit pointers. Pointer chasing apps are penalised by long
>> >> >> > load-to-use latencies (L1 cache hit latency, L2/L3 latencies, DRAM
>> >> >> > latency).
>> >> >> >
>> >> >> >>>>
>> >> >> >>>> So, we need to make sure ODP-Cloud is not pointer heavy and
>> does
>> >> >> >>>> not
>> >> >> >>>> force the application to be pointer heavy, to get good
>> performance
>> >> >> >>>> out
>> >> >> >>>> of 64b systems.
>> >> >> > Even with LP64, ODP could use 32-bit handles for ODP objects. The
>> >> >> > address lookup of the handle needs to be efficient (from a cache
>> >> >> > perspective) though, already now I can see hotspots in the
>> function
>> >> >> > that returns an address from a handle.
>> >> >> >
>> >> >>
>> >> >> Yes, this is what I am trying to convey. If we have 32-bit handles,
>> it
>> >> >> does not matter whether it is Aarch32 or Aarch64, the performance
>> will
>> >> >> be optimized.
>> >> >
>> >> >
>> >> > The only way we've been able to achieve strong typing with ODP is if
>> the
>> >> > handles are of size sizeof(void *). This isn't the case in AArch64,
>> so I
>> >> > don't think this will hold. Obviously when ODP is compiled for
>> AArch32
>> >> > pointers (and hence handles) are 32-bits.
>> >> >
>> >> I did not understand your comment on strong typing. Can you elaborate
>> >> or provide an example?
>> >> If the handles need to be 64b (i.e. even on a 32b system they are
>> >> 64b), then we should keep them as 64b. Otherwise, performance should
>> >> be given higher priority.
>> >
>> >
>> > Look at the ODP strong type files in the plat directory. We achieve
>> strong
>> > typing by defining handles to be pointers to structs, which C treats as
>> > different types. There doesn't appear to be any other way to achieve
>> this
>> > since C typedefs are weakly typed.
>> >>
>> >>
>> >>
>> >> >>
>> >> >>
>> >> >> >>>>
>> >> >> >>>> Thank you,
>> >> >> >>>> Honnappa
>> >> >> >>>>
>> >> >> >>>
>> >> >> >>>
>> >> >> >>>
>> >> >> >>> --
>> >> >> >>> [image: Linaro] <http://www.linaro.org/>
>> >> >> >>> François-Frédéric Ozog | *Director Linaro Networking Group*
>> >> >> >>> T: +33.67221.6485
>> >> >> >>> francois.o...@linaro.org | Skype: ffozog
>> >> >> >>>
>> >> >> >>>
>> >> >
>> >> >
>>
>
>
>
> --
> [image: Linaro] <http://www.linaro.org/>
> François-Frédéric Ozog | *Director Linaro Networking Group*
> T: +33.67221.6485
> francois.o...@linaro.org | Skype: ffozog
>
>

Re: [lng-odp] 32b support in ODP-Cloud

Reply via email to