>
>  
>
>> We've seen an odd code generation pattern in the ARM NEON generated by 
>> ISPC however:
>>
> [...]
>
>> It looks like reduce_add() causes the NEON LLVM to generate a 
>> non-inlineable add_f32 function. Is there some good reason that this LLVM 
>> IR isn't marked alwaysinline?
>>
>
> Not that I can recall, and not that I can see from reviewing the code now. 
> More generally, I think(?) that just about all of the functions in 
> builtins/target-* should be marked as alwaysinline; stuff like 
> __half_to_float_uniform also deserves that treatment. As I look through the 
> code for other backends, the 'alwaysinline' stuff is similarly somewhat 
> inconsistent. I assume that most of the time LLVM just inlines the simple 
> stuff anyway, but it'd be nice to make sure there aren't other performance 
> bugs like that one.
>
> Any chance you could make the changes (for NEON at least), make sure 
> things still work, and submit a pull request?
>
> It does seem very odd that LLVM wouldn't automatically inline a function 
consisting of a single instruction.

I've asked my employer for the time to send a pull request. If it's 
granted, happy to oblige.

Niall

-- 
You received this message because you are subscribed to the Google Groups 
"Intel SPMD Program Compiler Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to