lround and friends have been a big performance problem at times. Everytime you can use cast(int) instead, it's way faster.
I didn't know this trick. It generates almost the same sse intruction (it truncates) and has the advantage to be inline-able.
Is it documented somewhere ? If not it should.