On 29 March 2017 at 10:43, Francois Ozog <francois.o...@linaro.org> wrote:
> If there is a cost to get virtual address, then I assume translation is > NOT just casting: correct? > Correct. linux-generic has a number of dereferences in the code that returns e.g. the buffer address from a buffer handle. This is not optimised for performance. The design does provide the ability to check buffer handles for correctness/validity but I cannot see any code that actually does this so an invalid buffer handle might crash the code (some out of bounds memory access). > FF > > On 29 March 2017 at 10:00, Ola Liljedahl <ola.liljed...@linaro.org> wrote: > >> So there is a choice between >> A) enabling static type checking in the compiler through strong typing >> => requires (syntactical) pointers i C => handles are 64-bit on 64-bit >> systems >> B) optimise for size and cache efficiency by using 32-bit (scalar) handles >> >> Currently this choice is hard-wired into the ODP linux-generic >> implementation. >> >> When profiling some ODP examples, I can see hot spots in the functions >> that convert "pointer"-handles into the actual object pointers >> (virtual addresses). So we are paying a double price here, handles are >> large (increases cache pressure) and we have to translate handles to >> address before we can reference the objects in the ODP calls. >> >> On 29 March 2017 at 06:10, Bill Fischofer <bill.fischo...@linaro.org> >> wrote: >> > >> > On Tue, Mar 28, 2017 at 10:47 PM Honnappa Nagarahalli >> > <honnappa.nagaraha...@linaro.org> wrote: >> >> >> >> On 28 March 2017 at 22:27, Bill Fischofer <bill.fischo...@linaro.org> >> >> wrote: >> >> > >> >> > >> >> > On Mon, Mar 27, 2017 at 10:11 PM, Honnappa Nagarahalli >> >> > <honnappa.nagaraha...@linaro.org> wrote: >> >> >> >> >> >> On 27 March 2017 at 08:36, Ola Liljedahl <ola.liljed...@linaro.org> >> >> >> wrote: >> >> >> > On 27 March 2017 at 07:58, Honnappa Nagarahalli >> >> >> > <honnappa.nagaraha...@linaro.org> wrote: >> >> >> >> My answers inline. I was confused as hell just a month back :) >> >> >> >> >> >> >> >> On 23 March 2017 at 06:28, Francois Ozog < >> francois.o...@linaro.org> >> >> >> >> wrote: >> >> >> >> >> >> >> >>> The more I dig the less I understand ;-) >> >> >> >>> >> >> >> >>> Let me ask a few questions: >> >> >> >>> >> >> >> >>> - in the future, when selling 32 bit silicon, which architecture >> >> >> >>> version >> >> >> >>> will it be ARMv7 or ARMv8 ? >> >> >> > AFAIK, future 32-bit ARM cores (from ARM) will be ARMv8. But >> people >> >> >> > are still building SoC's with e.g. ARM920 which is ARMv4T or >> >> >> > something. >> >> >> > >> >> >> >>> >> >> >> >> >> >> >> >> What you are referring to is ISA version, not architecture. >> AArch32 >> >> >> >> and >> >> >> >> AArch64 are architectures. ARMv8 also supports AArch32 (i.e. >> AArch32 >> >> >> >> with >> >> >> >> ARMv8 ISA) >> >> >> > ARMv8 has two architectural states, AArch32 and AArch64. An ARMv8 >> >> >> > implementation can implement either-or or both. There are already >> >> >> > examples out there of all these different combinations. >> >> >> > >> >> >> > AAarch32 supports the A32 and T32 ISA's, these are closely >> related to >> >> >> > (basically extensions of) the corresponding ARMv7a ARM and >> Thumb(-2) >> >> >> > ISA's. >> >> >> > The A32 (and T32?) ISA's have some of the ARMv8 extensions, e.g. >> >> >> > load-acquire, store-release, crypto instructions, simplified WFE >> >> >> > support etc. >> >> >> > A user space ARMv7a image should run unmodified on ARMv8/AArch32, >> I >> >> >> > don't know about other privilege levels but I can imagine an >> ARMv7a >> >> >> > kernel running in AArch32 with an AArch64 hypervisor. >> >> >> > >> >> >> > AArch64 supports the A64 ISA. This ISA actually supports both >> 32-bit >> >> >> > and 64-bit operations (although all addresses are 64-bit AFAIK). >> >> >> > 32-bit operations use Wn registers and 64-bit operations use Xn >> >> >> > registers. It's the same register set, Wn just denotes the lower >> 32 >> >> >> > bits. >> >> >> > >> >> >> >> >> >> >> >> - is the target solution will be running ALL in 32 bits? (boot >> in 32 >> >> >> >> bits, >> >> >> >>> Linux 32 bits, 32 bits apps)? >> >> >> >>> - or is the target solution will be hybrid (64 bits Linux and >> some >> >> >> >>> 32 >> >> >> >>> bits >> >> >> >>> apps). >> >> >> > I think this is the more likely path. If you have >= than 4GB of >> RAM >> >> >> > (and also other stuff that needs physical addressing), you want a >> >> >> > 64-bit kernel. >> >> >> > >> >> >> >>> >> >> >> >> >> >> >> >> The target solution could be Hybrid. Linux could be 64b, the >> >> >> >> applications >> >> >> >> could be 32b. It is my understanding that everything 32b is also >> >> >> >> possible >> >> >> >> using AArch32. >> >> >> >> >> >> >> >> >> >> >> >>> When I read "AArch64 was designed to remove known implementation >> >> >> >>> challenges of AArch32 cores" on http://infocenter.arm.com/ >> >> >> >>> help/index.jsp?topic=/com.arm.doc.dai0490a/ar01s01.html >> >> >> >>> I wonder if stating we support AArch32 is a good idea... >> >> >> >>> >> >> >> >>> So what is the best way to describe what we want? >> >> >> >>> - ARMv8 LP64 or ILP32 ? >> >> >> >>> - AArch64 LP64 or ILP32 ? >> >> >> >>> - LP64 or ILP32? >> >> >> >>> >> >> >> >>> I think the best way to say is 'we support AArch64 and AArch32'. >> >> >> > Re AArch64, LP64 or ILP32 applications? >> >> >> > >> >> >> > AArch32 or ARMv7a? >> >> >> > >> >> >> >> >> >> >> >> >> >> >> >>> FF >> >> >> >>> >> >> >> >>> >> >> >> >>> On 23 March 2017 at 04:57, Honnappa Nagarahalli < >> >> >> >>> honnappa.nagaraha...@linaro.org> wrote: >> >> >> >>> >> >> >> >>>> Hi Bill / Matt and others, >> >> >> >>>> What I was trying to say in our discussion is that, >> >> >> >>>> the >> >> >> >>>> ODP-Cloud code should not be pointer heavy. >> >> >> >>>> >> >> >> >>>> Please take a look at this video from BUD17: >> >> >> >>>> http://connect.linaro.org/resource/bud17/bud17-101/ >> (unfortunately >> >> >> >>>> there are no slides, I am trying to get them). This talks about >> >> >> >>>> the >> >> >> >>>> performance of the 32b application on AArch64. One of the >> >> >> >>>> applications, has huge performance improvement while running in >> >> >> >>>> 32b >> >> >> >>>> mode (ILP32 in this particular case) on AArch64 (when compared >> to >> >> >> >>>> the >> >> >> >>>> same application compiled for 64b mode running on AArch64 i.e. >> in >> >> >> >>>> 64b >> >> >> >>>> compilation it performed very poorly). My understanding is that >> >> >> >>>> this >> >> >> >>>> particular application is a pointer chasing application. Other >> >> >> >>>> applications which are not pointer heavy, do not have this >> >> >> >>>> behavior. >> >> >> > Isn't the problem with LP64 that if you have a lot of pointers >> stored >> >> >> > in data structures, these take 2x the space of ILP32 pointers and >> >> >> > thus >> >> >> > increases the cache pressure. >> >> >> > >> >> >> > I don't think it is the pointer chasing itself that is penalised >> by >> >> >> > 64-bit pointers. Pointer chasing apps are penalised by long >> >> >> > load-to-use latencies (L1 cache hit latency, L2/L3 latencies, DRAM >> >> >> > latency). >> >> >> > >> >> >> >>>> >> >> >> >>>> So, we need to make sure ODP-Cloud is not pointer heavy and >> does >> >> >> >>>> not >> >> >> >>>> force the application to be pointer heavy, to get good >> performance >> >> >> >>>> out >> >> >> >>>> of 64b systems. >> >> >> > Even with LP64, ODP could use 32-bit handles for ODP objects. The >> >> >> > address lookup of the handle needs to be efficient (from a cache >> >> >> > perspective) though, already now I can see hotspots in the >> function >> >> >> > that returns an address from a handle. >> >> >> > >> >> >> >> >> >> Yes, this is what I am trying to convey. If we have 32-bit handles, >> it >> >> >> does not matter whether it is Aarch32 or Aarch64, the performance >> will >> >> >> be optimized. >> >> > >> >> > >> >> > The only way we've been able to achieve strong typing with ODP is if >> the >> >> > handles are of size sizeof(void *). This isn't the case in AArch64, >> so I >> >> > don't think this will hold. Obviously when ODP is compiled for >> AArch32 >> >> > pointers (and hence handles) are 32-bits. >> >> > >> >> I did not understand your comment on strong typing. Can you elaborate >> >> or provide an example? >> >> If the handles need to be 64b (i.e. even on a 32b system they are >> >> 64b), then we should keep them as 64b. Otherwise, performance should >> >> be given higher priority. >> > >> > >> > Look at the ODP strong type files in the plat directory. We achieve >> strong >> > typing by defining handles to be pointers to structs, which C treats as >> > different types. There doesn't appear to be any other way to achieve >> this >> > since C typedefs are weakly typed. >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >>>> >> >> >> >>>> Thank you, >> >> >> >>>> Honnappa >> >> >> >>>> >> >> >> >>> >> >> >> >>> >> >> >> >>> >> >> >> >>> -- >> >> >> >>> [image: Linaro] <http://www.linaro.org/> >> >> >> >>> François-Frédéric Ozog | *Director Linaro Networking Group* >> >> >> >>> T: +33.67221.6485 >> >> >> >>> francois.o...@linaro.org | Skype: ffozog >> >> >> >>> >> >> >> >>> >> >> > >> >> > >> > > > > -- > [image: Linaro] <http://www.linaro.org/> > François-Frédéric Ozog | *Director Linaro Networking Group* > T: +33.67221.6485 > francois.o...@linaro.org | Skype: ffozog > >