Ian Lynagh wrote:
Hi all,

I was under the impression that simple code like the below, which swaps
the endianness of a block of data, ought to be near C speed:

[...]
      poke p (shiftL x 24 .|. shiftL (x .&. 0xff00) 8
                          .|. (shiftR x 8 .&. 0xff00)
                          .|. shiftR x 24)
[...]

The problem here is that the shiftL and shiftR operations don't get inlined properly. They get replaced by a call to shift, but that doesn't get inlined.
The shift function also wastes some more time by checking the sign of the shift amount.
A few well-placed INLINE pragmas in the libraries might help.


Is there anything I can do to get better performance in this sort of
code without resorting to calling out to C?

You could import some private GHC modules and use the primop directly:

import GHC.Prim
import GHC.Word

main :: IO ()
main = do p <- mallocArray 104857600
          foo p 104857600

shiftL (W32# a) (I# b) = W32# (shiftL# a b)
shiftR (W32# a) (I# b) = W32# (shiftRL# a b)

Using those instead of the standard ones speeds up the program a lot; be aware however that you shouldn't use negative shift amounts with those (undefined result, no checking).

Cheers,

Wolfgang

_______________________________________________
Glasgow-haskell-users mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

Reply via email to