[Haskell-cafe] Switch optimization
Hi All, I'm writing some variable byte codec routines, which are used in inner^N loops, and which I want to make really efficient. There are several spots where the code uses lookup tables. The best example is the table, which given the first byte, returns the number of additional bytes. It is a precomputed version of the following function: codeLength :: Word8 - Int codeLength w | w .. 0x80 == 0 = 0 | otherwise = 1 + (codeLength $ w `shiftL` 1) from which we construct a 256 entry table: codeLen 0 = 0 codeLen 1 = 0 codeLen 2 = 0 ... codeLen 127 = 0 codeLen 128 = 1 ... codeLen 255 = 8 Now, compiling with ghc 6.6.1 and -O3, I see that it generates a long chain of conditional branches. Now, even taking laziness into account, this seems like terribly inefficient code. I wrote this thinking it would be more efficient than constructing a CAF array: codeLens = listArray (0,255) (map codeLength [0..255]) but I'm guessing the CAF version would probably work out much more efficient in the long run. However, I think ghc (and any other compiler), should detect the following properties: 1. all the clauses are mutually exclusive, so the sequencing is irrelevant to the semantics 2. Given an explicit type declaration Word8 - , the 256 cases cover all the possible constructors of the type, so there are no missing clauses. I would have expected the generated code to have the form: codeLen: x - pick up arg return closure (codeLen' x) codeLen': x' - force x update precomputed-table[x'] Even if you leave out property 2 and include bounds checks, this seems like an important kind of function to optimize well. So what have I missed, or is it time to learn how to hack on ghc? T. -- Thomas Conway [EMAIL PROTECTED] Silence is the perfectest herald of joy: I were but little happy, if I could say how much. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Switch optimization
On Mon, Jun 11, 2007 at 09:43:17AM +1000, Thomas Conway wrote: Hi All, I'm writing some variable byte codec routines, which are used in inner^N loops, and which I want to make really efficient. There are several spots where the code uses lookup tables. The best example is the table, which given the first byte, returns the number of additional bytes. It is a precomputed version of the following function: codeLength :: Word8 - Int codeLength w | w .. 0x80 == 0 = 0 | otherwise = 1 + (codeLength $ w `shiftL` 1) from which we construct a 256 entry table: codeLen 0 = 0 codeLen 1 = 0 codeLen 2 = 0 ... codeLen 127 = 0 codeLen 128 = 1 ... codeLen 255 = 8 Now, compiling with ghc 6.6.1 and -O3, I see that it generates a long chain of conditional branches. Now, even taking laziness into account, this seems like terribly inefficient code. I wrote this thinking it would be more efficient than constructing a CAF array: codeLens = listArray (0,255) (map codeLength [0..255]) but I'm guessing the CAF version would probably work out much more efficient in the long run. However, I think ghc (and any other compiler), should detect the following properties: 1. all the clauses are mutually exclusive, so the sequencing is irrelevant to the semantics 2. Given an explicit type declaration Word8 - , the 256 cases cover all the possible constructors of the type, so there are no missing clauses. I would have expected the generated code to have the form: codeLen: x - pick up arg return closure (codeLen' x) codeLen': x' - force x update precomputed-table[x'] Even if you leave out property 2 and include bounds checks, this seems like an important kind of function to optimize well. So what have I missed, or is it time to learn how to hack on ghc? I'd suggest learning how to hack on GHC, I get a chain of branches even at the maximum optimization setting. Hackers are always appreciated! For an early project, how about http://hackage.haskell.org/trac/ghc/ticket/1261 :) (-O3 is misparsed as an extremely low setting, worse than -O and sometimes worse than -Onot) BTW, you should definitely look at the FFI for small performance critical functions. It's already a miracle that Haskell is as fast as it is; I would not hold high hopes for idiomatic general Haskell reaching C speed any time soon. (For some special cases like operating on multi-MB strings and large parallel arrays, there is definite promise.) Of course the interface is fairly costly, be sure to benchmark. Stefan ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Switch optimization
Stefan O'Rear wrote: | On Mon, Jun 11, 2007 at 09:43:17AM +1000, Thomas Conway wrote: : | codeLen 127 = 0 | codeLen 128 = 1 | ... | codeLen 255 = 8 | | Now, compiling with ghc 6.6.1 and -O3, I see that it generates a long | chain of conditional branches. : That's deeply tied in with Num being a subclass of Eq, isn't it? Pattern matching for /non-numeric/ data constructors should be more direct, at any optimisation level, thanks to the taglessness of the STG machine. | I'd suggest learning how to hack on GHC, I get a chain of branches even | at the maximum optimization setting. Hackers are always appreciated! Such a GHC hack may have to be limited to some known, standard numeric types. User-defined numeric types may not provide anything that you can use to look up a jump table. How would an array go, as a user-level optimisation? Regards, Tom ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe