[Haskell-cafe] Switch optimization

2007-06-10 Thread Thomas Conway

Hi All,

I'm writing some variable byte codec routines, which are used in
inner^N loops, and which I want to make really efficient. There are
several spots where the code uses lookup tables.

The best example is the table, which given the first byte, returns the
number of additional bytes. It is a precomputed version of the
following function:


codeLength :: Word8 - Int
codeLength w
 | w .. 0x80 == 0   = 0
 | otherwise = 1 + (codeLength $ w `shiftL` 1)


from which we construct a 256 entry table:

codeLen 0 = 0
codeLen 1 = 0
codeLen 2 = 0
...
codeLen 127 = 0
codeLen 128 = 1
...
codeLen 255 = 8

Now, compiling with ghc 6.6.1 and -O3, I see that it generates a long
chain of conditional branches. Now, even taking laziness into account,
this seems like terribly inefficient code.  I wrote this thinking it
would be more efficient than constructing a CAF array:

codeLens = listArray (0,255) (map codeLength [0..255])

but I'm guessing the CAF version would probably work out much more
efficient in the long run.

However, I think ghc (and any other compiler), should detect the
following properties:

1. all the clauses are mutually exclusive, so the sequencing is
irrelevant to the semantics

2. Given an explicit type declaration Word8 - , the 256 cases
cover all the possible constructors of the type, so there are no
missing clauses.

I would have expected the generated code to have the form:

codeLen:
   x - pick up arg
  return closure (codeLen' x)

codeLen':
   x' - force x
   update precomputed-table[x']

Even if you leave out property 2 and include bounds checks, this seems
like an important kind of function to optimize well. So what have I
missed, or is it time to learn how to hack on ghc?

T.
--
Thomas Conway
[EMAIL PROTECTED]

Silence is the perfectest herald of joy:
I were but little happy, if I could say how much.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Switch optimization

2007-06-10 Thread Stefan O'Rear
On Mon, Jun 11, 2007 at 09:43:17AM +1000, Thomas Conway wrote:
 Hi All,
 
 I'm writing some variable byte codec routines, which are used in
 inner^N loops, and which I want to make really efficient. There are
 several spots where the code uses lookup tables.
 
 The best example is the table, which given the first byte, returns the
 number of additional bytes. It is a precomputed version of the
 following function:
 
 codeLength :: Word8 - Int
 codeLength w
  | w .. 0x80 == 0   = 0
  | otherwise = 1 + (codeLength $ w `shiftL` 1)
 
 from which we construct a 256 entry table:
 
 codeLen 0 = 0
 codeLen 1 = 0
 codeLen 2 = 0
 ...
 codeLen 127 = 0
 codeLen 128 = 1
 ...
 codeLen 255 = 8
 
 Now, compiling with ghc 6.6.1 and -O3, I see that it generates a long
 chain of conditional branches. Now, even taking laziness into account,
 this seems like terribly inefficient code.  I wrote this thinking it
 would be more efficient than constructing a CAF array:
 
 codeLens = listArray (0,255) (map codeLength [0..255])
 
 but I'm guessing the CAF version would probably work out much more
 efficient in the long run.
 
 However, I think ghc (and any other compiler), should detect the
 following properties:
 
 1. all the clauses are mutually exclusive, so the sequencing is
 irrelevant to the semantics
 
 2. Given an explicit type declaration Word8 - , the 256 cases
 cover all the possible constructors of the type, so there are no
 missing clauses.
 
 I would have expected the generated code to have the form:
 
 codeLen:
x - pick up arg
   return closure (codeLen' x)
 
 codeLen':
x' - force x
update precomputed-table[x']
 
 Even if you leave out property 2 and include bounds checks, this seems
 like an important kind of function to optimize well. So what have I
 missed, or is it time to learn how to hack on ghc?

I'd suggest learning how to hack on GHC, I get a chain of branches even
at the maximum optimization setting.  Hackers are always appreciated!

For an early project, how about
http://hackage.haskell.org/trac/ghc/ticket/1261 :)

(-O3 is misparsed as an extremely low setting, worse than -O and
sometimes worse than -Onot)

BTW, you should definitely look at the FFI for small performance
critical functions.  It's already a miracle that Haskell is as fast as
it is; I would not hold high hopes for idiomatic general Haskell
reaching C speed any time soon.  (For some special cases like operating
on multi-MB strings and large parallel arrays, there is definite
promise.)  Of course the interface is fairly costly, be sure to
benchmark.

Stefan
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Switch optimization

2007-06-10 Thread Tom Pledger

Stefan O'Rear wrote:
 | On Mon, Jun 11, 2007 at 09:43:17AM +1000, Thomas Conway wrote:
 :
 |  codeLen 127 = 0
 |  codeLen 128 = 1
 |  ...
 |  codeLen 255 = 8
 | 
 |  Now, compiling with ghc 6.6.1 and -O3, I see that it generates a long
 |  chain of conditional branches.
 :

That's deeply tied in with Num being a subclass of Eq, isn't it?   
Pattern matching for /non-numeric/ data constructors should be more  
direct, at any optimisation level, thanks to the taglessness of the  
STG machine.


 | I'd suggest learning how to hack on GHC, I get a chain of branches even
 | at the maximum optimization setting.  Hackers are always appreciated!

Such a GHC hack may have to be limited to some known, standard numeric  
types.  User-defined numeric types may not provide anything that you  
can use to look up a jump table.


How would an array go, as a user-level optimisation?

Regards,
Tom


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe