Pull request as requested is at https://github.com/ispc/ispc/pull/1227. My thanks to my employer for sponsoring the improvement.
Thanks for the help and the product guys. For the same effort of hand porting all our SSE and AVX intrinsic code to ARM NEON we got an ISPC port instead which supports everything we need now and into the future. In case anyone is interested, we actually have a Python script call ISPC both on Windows and via the Linux Subsystem for Windows to generate assembler files for ARM NEON float x4, SSE2 float x4, AVX float x8 and AVX2 float x16 for the calling conventions x86, x64-msvc and x64-sysv (the need to call ISPC under Windows vs ISPC under Linux is to get both the x64-msvc and x64-sysv calling convention output). Those assembler files are then parsed by the Python script, translating the armhf output ISPC generates into armel for Android ARM and doing a few other hand tweaks, and are committed directly to source control as they change rarely. During build, the AT&T format assembler files are compiled as normal by cmake for Linux/BSD/OS X/Android but on Windows we abuse the Mingw-w64 GNU as assembler to make it generate a MSVC compatible .obj file from the AT&T assembler but-in-msvc-calling-convention files output by ISPC. That is then linked in by Visual Studio as per normal. Believe it or not, it all works a treat. It's been a very successful risk we took in choosing ISPC to generate assembler instead of doing it by hand, and I'm sad to say we are done with optimisation now and moving on to other topics far removed. Nevertheless thanks once again, and you may like to know the only reason we heard of your work is because I sit on the Programme Committee for CppCon where this (now accepted) talk https://cppcon2016.sched.org/event/7nKw/spmd-programming-using-c-and-ispc was one assigned to me for review. So many thanks to that student for bringing ISPC to our attention! Niall On Friday, September 2, 2016 at 3:31:25 PM UTC+1, Niall Douglas wrote: > > >>> It does seem very odd that LLVM wouldn't automatically inline a function >> consisting of a single instruction. >> > > I've discovered through trial and error it is the lack of the "readnone" > modifier which causes LLVM to not inline the function. After looking up > that modifier I can see why that would be the case, and indeed why the lack > of that modifier would penalise optimisation of ARM NEON generated because > LLVM will assume every such function not so marked will change outcomes if > global memory state could have been changed. In particular, it would > severely restrict the reordering of instructions LLVM could do. > > Quite a few of the ARM NEON builtins are missing "readnone". None that I > can see of the AVX builtins is missing it. I am surprised this problem > hasn't been raised before, it's very obvious from the assembler output. > > >> >> I've asked my employer for the time to send a pull request. If it's >> granted, happy to oblige. >> >> I've been allowed this time by my employer who wishes to remain > anonymous. I'll issue a pull request next week which applies nounwind > readnone alwaysinline to everything in the NEON builtins, using the AVX > builtins as a guide. I should think this will improve the optimisation > quality of the NEON output quite a bit wherever it uses the builtins. > > Niall > > -- You received this message because you are subscribed to the Google Groups "Intel SPMD Program Compiler Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
