On Mon, Jun 11, 2007 at 09:43:17AM +1000, Thomas Conway wrote: > Hi All, > > I'm writing some variable byte codec routines, which are used in > inner^N loops, and which I want to make really efficient. There are > several spots where the code uses lookup tables. > > The best example is the table, which given the first byte, returns the > number of additional bytes. It is a precomputed version of the > following function: > > >codeLength :: Word8 -> Int > >codeLength w > > | w .&. 0x80 == 0 = 0 > > | otherwise = 1 + (codeLength $ w `shiftL` 1) > > from which we construct a 256 entry table: > > codeLen 0 = 0 > codeLen 1 = 0 > codeLen 2 = 0 > ... > codeLen 127 = 0 > codeLen 128 = 1 > ... > codeLen 255 = 8 > > Now, compiling with ghc 6.6.1 and -O3, I see that it generates a long > chain of conditional branches. Now, even taking laziness into account, > this seems like terribly inefficient code. I wrote this thinking it > would be more efficient than constructing a CAF array: > > codeLens = listArray (0,255) (map codeLength [0..255]) > > but I'm guessing the CAF version would probably work out much more > efficient in the long run. > > However, I think ghc (and any other compiler), should detect the > following properties: > > 1. all the clauses are mutually exclusive, so the sequencing is > irrelevant to the semantics > > 2. Given an explicit type declaration Word8 -> ...., the 256 cases > cover all the possible constructors of the type, so there are no > missing clauses. > > I would have expected the generated code to have the form: > > codeLen: > x <- pick up arg > return closure (codeLen' x) > > codeLen': > x' <- force x > update precomputed-table[x'] > > Even if you leave out property 2 and include bounds checks, this seems > like an important kind of function to optimize well. So what have I > missed, or is it time to learn how to hack on ghc?
I'd suggest learning how to hack on GHC, I get a chain of branches even at the maximum optimization setting. Hackers are always appreciated! For an early project, how about http://hackage.haskell.org/trac/ghc/ticket/1261 :) (-O3 is misparsed as an extremely low setting, worse than -O and sometimes worse than -Onot) BTW, you should definitely look at the FFI for small performance critical functions. It's already a miracle that Haskell is as fast as it is; I would not hold high hopes for idiomatic general Haskell reaching C speed any time soon. (For some special cases like operating on multi-MB strings and large parallel arrays, there is definite promise.) Of course the interface is fairly costly, be sure to benchmark. Stefan _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe