On Wed, Mar 29, 2017 at 7:29 AM, Ola Liljedahl <ola.liljed...@linaro.org> wrote:
> > > On 29 March 2017 at 13:25, Bill Fischofer <bill.fischo...@linaro.org> > wrote: > >> >> >> On Wed, Mar 29, 2017 at 5:47 AM, Ola Liljedahl <ola.liljed...@linaro.org> >> wrote: >> >>> On 29 March 2017 at 10:43, Francois Ozog <francois.o...@linaro.org> >>> wrote: >>> >>>> If there is a cost to get virtual address, then I assume translation is >>>> NOT just casting: correct? >>>> >>> Correct. linux-generic has a number of dereferences in the code that >>> returns e.g. the buffer address from a buffer handle. This is not optimised >>> for performance. The design does provide the ability to check buffer >>> handles for correctness/validity but I cannot see any code that actually >>> does this so an invalid buffer handle might crash the code (some out of >>> bounds memory access). >>> >>> I suspect that the hot spots are due to the fact that in many cases we >> are only using a 32-bit value and wrapping it in a 64-bit handle. This was >> originally done to make the strong typing 32/64 bit agnostic. But this can >> change if we widen the linux-generic handles to use the full pointer width. >> Ola: you should no longer be seeing those hot spots in the packet code >> since with the more recent changes Petri introduced the odp_packet_t is now >> simply a pointer to an odp_packet_hdr_t, similar to how in odp-dpdk it is a >> pointer to an rte_mbuf. >> > This is/was a benchmark (odp_sched_latency) that uses buffers. But it is > good that we now use pointers for packet handles. Perhaps we can do the > same thing for buffers and other event types? > The changes are pretty straightforward. I'll look into posting a similar patch for buffers to match the packet changes. One of the advantages of using abstract types is how easy it really is to change their internals and just update a couple of conversion routines. > > >> Certainly in odp-cloud we should do a similar mapping for other key >> handle types. >> >> Another approach requires a bit more config/tools technology would be to >> support multiple type definitions as a performance tuning option. When >> developing you'd compile using an include structure that has handles >> defined as pointers to structs to get the strong typing and then support a >> compile option for production use that redefines the handles to be >> uint32_t. That would reduce their footprint to 32-bits but would lose >> strong type checking, however that's a trade-off an application writer >> could decide is worth while. >> >> Currently we support a -DDEBUG option that includes additional runtime >> checking. We could do this via a similar -DTYPE_CHECK option (default) and >> support a -DNO_TYPE_CHECK for the "compact" handles. >> >> >>> >>>> FF >>>> >>>> On 29 March 2017 at 10:00, Ola Liljedahl <ola.liljed...@linaro.org> >>>> wrote: >>>> >>>>> So there is a choice between >>>>> A) enabling static type checking in the compiler through strong typing >>>>> => requires (syntactical) pointers i C => handles are 64-bit on 64-bit >>>>> systems >>>>> B) optimise for size and cache efficiency by using 32-bit (scalar) >>>>> handles >>>>> >>>>> Currently this choice is hard-wired into the ODP linux-generic >>>>> implementation. >>>>> >>>>> When profiling some ODP examples, I can see hot spots in the functions >>>>> that convert "pointer"-handles into the actual object pointers >>>>> (virtual addresses). So we are paying a double price here, handles are >>>>> large (increases cache pressure) and we have to translate handles to >>>>> address before we can reference the objects in the ODP calls. >>>>> >>>>> On 29 March 2017 at 06:10, Bill Fischofer <bill.fischo...@linaro.org> >>>>> wrote: >>>>> > >>>>> > On Tue, Mar 28, 2017 at 10:47 PM Honnappa Nagarahalli >>>>> > <honnappa.nagaraha...@linaro.org> wrote: >>>>> >> >>>>> >> On 28 March 2017 at 22:27, Bill Fischofer < >>>>> bill.fischo...@linaro.org> >>>>> >> wrote: >>>>> >> > >>>>> >> > >>>>> >> > On Mon, Mar 27, 2017 at 10:11 PM, Honnappa Nagarahalli >>>>> >> > <honnappa.nagaraha...@linaro.org> wrote: >>>>> >> >> >>>>> >> >> On 27 March 2017 at 08:36, Ola Liljedahl < >>>>> ola.liljed...@linaro.org> >>>>> >> >> wrote: >>>>> >> >> > On 27 March 2017 at 07:58, Honnappa Nagarahalli >>>>> >> >> > <honnappa.nagaraha...@linaro.org> wrote: >>>>> >> >> >> My answers inline. I was confused as hell just a month back :) >>>>> >> >> >> >>>>> >> >> >> On 23 March 2017 at 06:28, Francois Ozog < >>>>> francois.o...@linaro.org> >>>>> >> >> >> wrote: >>>>> >> >> >> >>>>> >> >> >>> The more I dig the less I understand ;-) >>>>> >> >> >>> >>>>> >> >> >>> Let me ask a few questions: >>>>> >> >> >>> >>>>> >> >> >>> - in the future, when selling 32 bit silicon, which >>>>> architecture >>>>> >> >> >>> version >>>>> >> >> >>> will it be ARMv7 or ARMv8 ? >>>>> >> >> > AFAIK, future 32-bit ARM cores (from ARM) will be ARMv8. But >>>>> people >>>>> >> >> > are still building SoC's with e.g. ARM920 which is ARMv4T or >>>>> >> >> > something. >>>>> >> >> > >>>>> >> >> >>> >>>>> >> >> >> >>>>> >> >> >> What you are referring to is ISA version, not architecture. >>>>> AArch32 >>>>> >> >> >> and >>>>> >> >> >> AArch64 are architectures. ARMv8 also supports AArch32 (i.e. >>>>> AArch32 >>>>> >> >> >> with >>>>> >> >> >> ARMv8 ISA) >>>>> >> >> > ARMv8 has two architectural states, AArch32 and AArch64. An >>>>> ARMv8 >>>>> >> >> > implementation can implement either-or or both. There are >>>>> already >>>>> >> >> > examples out there of all these different combinations. >>>>> >> >> > >>>>> >> >> > AAarch32 supports the A32 and T32 ISA's, these are closely >>>>> related to >>>>> >> >> > (basically extensions of) the corresponding ARMv7a ARM and >>>>> Thumb(-2) >>>>> >> >> > ISA's. >>>>> >> >> > The A32 (and T32?) ISA's have some of the ARMv8 extensions, >>>>> e.g. >>>>> >> >> > load-acquire, store-release, crypto instructions, simplified >>>>> WFE >>>>> >> >> > support etc. >>>>> >> >> > A user space ARMv7a image should run unmodified on >>>>> ARMv8/AArch32, I >>>>> >> >> > don't know about other privilege levels but I can imagine an >>>>> ARMv7a >>>>> >> >> > kernel running in AArch32 with an AArch64 hypervisor. >>>>> >> >> > >>>>> >> >> > AArch64 supports the A64 ISA. This ISA actually supports both >>>>> 32-bit >>>>> >> >> > and 64-bit operations (although all addresses are 64-bit >>>>> AFAIK). >>>>> >> >> > 32-bit operations use Wn registers and 64-bit operations use Xn >>>>> >> >> > registers. It's the same register set, Wn just denotes the >>>>> lower 32 >>>>> >> >> > bits. >>>>> >> >> > >>>>> >> >> >> >>>>> >> >> >> - is the target solution will be running ALL in 32 bits? >>>>> (boot in 32 >>>>> >> >> >> bits, >>>>> >> >> >>> Linux 32 bits, 32 bits apps)? >>>>> >> >> >>> - or is the target solution will be hybrid (64 bits Linux >>>>> and some >>>>> >> >> >>> 32 >>>>> >> >> >>> bits >>>>> >> >> >>> apps). >>>>> >> >> > I think this is the more likely path. If you have >= than 4GB >>>>> of RAM >>>>> >> >> > (and also other stuff that needs physical addressing), you >>>>> want a >>>>> >> >> > 64-bit kernel. >>>>> >> >> > >>>>> >> >> >>> >>>>> >> >> >> >>>>> >> >> >> The target solution could be Hybrid. Linux could be 64b, the >>>>> >> >> >> applications >>>>> >> >> >> could be 32b. It is my understanding that everything 32b is >>>>> also >>>>> >> >> >> possible >>>>> >> >> >> using AArch32. >>>>> >> >> >> >>>>> >> >> >> >>>>> >> >> >>> When I read "AArch64 was designed to remove known >>>>> implementation >>>>> >> >> >>> challenges of AArch32 cores" on http://infocenter.arm.com/ >>>>> >> >> >>> help/index.jsp?topic=/com.arm.doc.dai0490a/ar01s01.html >>>>> >> >> >>> I wonder if stating we support AArch32 is a good idea... >>>>> >> >> >>> >>>>> >> >> >>> So what is the best way to describe what we want? >>>>> >> >> >>> - ARMv8 LP64 or ILP32 ? >>>>> >> >> >>> - AArch64 LP64 or ILP32 ? >>>>> >> >> >>> - LP64 or ILP32? >>>>> >> >> >>> >>>>> >> >> >>> I think the best way to say is 'we support AArch64 and >>>>> AArch32'. >>>>> >> >> > Re AArch64, LP64 or ILP32 applications? >>>>> >> >> > >>>>> >> >> > AArch32 or ARMv7a? >>>>> >> >> > >>>>> >> >> >> >>>>> >> >> >> >>>>> >> >> >>> FF >>>>> >> >> >>> >>>>> >> >> >>> >>>>> >> >> >>> On 23 March 2017 at 04:57, Honnappa Nagarahalli < >>>>> >> >> >>> honnappa.nagaraha...@linaro.org> wrote: >>>>> >> >> >>> >>>>> >> >> >>>> Hi Bill / Matt and others, >>>>> >> >> >>>> What I was trying to say in our discussion is >>>>> that, >>>>> >> >> >>>> the >>>>> >> >> >>>> ODP-Cloud code should not be pointer heavy. >>>>> >> >> >>>> >>>>> >> >> >>>> Please take a look at this video from BUD17: >>>>> >> >> >>>> http://connect.linaro.org/resource/bud17/bud17-101/ >>>>> (unfortunately >>>>> >> >> >>>> there are no slides, I am trying to get them). This talks >>>>> about >>>>> >> >> >>>> the >>>>> >> >> >>>> performance of the 32b application on AArch64. One of the >>>>> >> >> >>>> applications, has huge performance improvement while >>>>> running in >>>>> >> >> >>>> 32b >>>>> >> >> >>>> mode (ILP32 in this particular case) on AArch64 (when >>>>> compared to >>>>> >> >> >>>> the >>>>> >> >> >>>> same application compiled for 64b mode running on AArch64 >>>>> i.e. in >>>>> >> >> >>>> 64b >>>>> >> >> >>>> compilation it performed very poorly). My understanding is >>>>> that >>>>> >> >> >>>> this >>>>> >> >> >>>> particular application is a pointer chasing application. >>>>> Other >>>>> >> >> >>>> applications which are not pointer heavy, do not have this >>>>> >> >> >>>> behavior. >>>>> >> >> > Isn't the problem with LP64 that if you have a lot of pointers >>>>> stored >>>>> >> >> > in data structures, these take 2x the space of ILP32 pointers >>>>> and >>>>> >> >> > thus >>>>> >> >> > increases the cache pressure. >>>>> >> >> > >>>>> >> >> > I don't think it is the pointer chasing itself that is >>>>> penalised by >>>>> >> >> > 64-bit pointers. Pointer chasing apps are penalised by long >>>>> >> >> > load-to-use latencies (L1 cache hit latency, L2/L3 latencies, >>>>> DRAM >>>>> >> >> > latency). >>>>> >> >> > >>>>> >> >> >>>> >>>>> >> >> >>>> So, we need to make sure ODP-Cloud is not pointer heavy and >>>>> does >>>>> >> >> >>>> not >>>>> >> >> >>>> force the application to be pointer heavy, to get good >>>>> performance >>>>> >> >> >>>> out >>>>> >> >> >>>> of 64b systems. >>>>> >> >> > Even with LP64, ODP could use 32-bit handles for ODP objects. >>>>> The >>>>> >> >> > address lookup of the handle needs to be efficient (from a >>>>> cache >>>>> >> >> > perspective) though, already now I can see hotspots in the >>>>> function >>>>> >> >> > that returns an address from a handle. >>>>> >> >> > >>>>> >> >> >>>>> >> >> Yes, this is what I am trying to convey. If we have 32-bit >>>>> handles, it >>>>> >> >> does not matter whether it is Aarch32 or Aarch64, the >>>>> performance will >>>>> >> >> be optimized. >>>>> >> > >>>>> >> > >>>>> >> > The only way we've been able to achieve strong typing with ODP is >>>>> if the >>>>> >> > handles are of size sizeof(void *). This isn't the case in >>>>> AArch64, so I >>>>> >> > don't think this will hold. Obviously when ODP is compiled for >>>>> AArch32 >>>>> >> > pointers (and hence handles) are 32-bits. >>>>> >> > >>>>> >> I did not understand your comment on strong typing. Can you >>>>> elaborate >>>>> >> or provide an example? >>>>> >> If the handles need to be 64b (i.e. even on a 32b system they are >>>>> >> 64b), then we should keep them as 64b. Otherwise, performance should >>>>> >> be given higher priority. >>>>> > >>>>> > >>>>> > Look at the ODP strong type files in the plat directory. We achieve >>>>> strong >>>>> > typing by defining handles to be pointers to structs, which C treats >>>>> as >>>>> > different types. There doesn't appear to be any other way to achieve >>>>> this >>>>> > since C typedefs are weakly typed. >>>>> >> >>>>> >> >>>>> >> >>>>> >> >> >>>>> >> >> >>>>> >> >> >>>> >>>>> >> >> >>>> Thank you, >>>>> >> >> >>>> Honnappa >>>>> >> >> >>>> >>>>> >> >> >>> >>>>> >> >> >>> >>>>> >> >> >>> >>>>> >> >> >>> -- >>>>> >> >> >>> [image: Linaro] <http://www.linaro.org/> >>>>> >> >> >>> François-Frédéric Ozog | *Director Linaro Networking Group* >>>>> >> >> >>> T: +33.67221.6485 >>>>> >> >> >>> francois.o...@linaro.org | Skype: ffozog >>>>> >> >> >>> >>>>> >> >> >>> >>>>> >> > >>>>> >> > >>>>> >>>> >>>> >>>> >>>> -- >>>> [image: Linaro] <http://www.linaro.org/> >>>> François-Frédéric Ozog | *Director Linaro Networking Group* >>>> T: +33.67221.6485 >>>> francois.o...@linaro.org | Skype: ffozog >>>> >>>> >>> >> >