Simon Peyton-Jones <[EMAIL PROTECTED]> wrote,

> | However, we came across one problem (read, lack of
> | optimisation on GHC's part), which leads to tedious
> | duplication of a lot of code in our array library.
> | Basically, GHC does not recognise for tail recursive
> | functions when certain arguments (accumulators maintained in
> | a loop) can be unboxed.  This leads to massive overheads in
> | our code.  Currently, we circumvent the inefficiency by
> | having manually specialised versions of the loops for
> | different accumulator types and using RULES to select them
> | where appropriate (based on the type information).  I will
> | send you some example code illustrating the problem soon.
> 
> Yes please.

I have found a way of rephrasing the definition so that it
is properly optimised by GHC.  However, I think, it should
be possible to do this automatically and it is maybe not
unlike the optimisation done by simplCore/LiberateCase.

The code I would like to write is, for example, the
following

  import PrelGHC
  import PrelBase
  import PrelST

  fill :: MutableByteArray# s
       -> (acc -> Int)
       -> (acc -> acc)
       -> Int
       -> acc
       -> ST s acc
  {-# INLINE fill #-}
  fill mba# f g (I# n#) start = fill0 0# start
    where
      fill0 i# acc | i# ==# n# = return acc
                   | otherwise = do
                                   writeIntArray mba# (I# i#) (f acc)
                                   fill0 (i# +# 1#) (g acc)

  writeIntArray :: MutableByteArray# s -> Int -> Int -> ST s ()
  {-# INLINE writeIntArray #-}
  writeIntArray mba# (I# i#) (I# e#) = ST $ \s# ->
    case writeIntArray# mba# i# e# s#  of {s2#   ->
    (# s2#, () #)}

  foo mba# n = fill mba# id (+1) 1000 0

The interesting part is the handling of the accumulator.
After inlining `fill' into `foo', it becomes obvious that
the accumulator can be maintained as an unboxed integer.
Unfortunately, it is not obvious to GHC, which generates the
following (this is just the inlined `fill0' loop):

        __letrec {
          $wfill0 :: (PrelGHC.Int#
                      -> PrelBase.Int
                         -> PrelGHC.State# s -> (PrelGHC.State# s, PrelBase.Int))
          __A 3 __C
          $wfill0
            = \ w2 :: PrelGHC.Int#
                w3 :: PrelBase.Int
                w4 :: (PrelGHC.State# s) ->
                  case w2 of wild {
                      1000 -> (# w4, w3 #);
                      __DEFAULT ->
                          case w3 of wild1 { PrelBase.I# e# ->
                          case PrelGHC.writeIntArray# @ s w wild e# w4 of s2# { 
__DEFAULT ->
                          case PrelGHC.+# e# 1 of a { __DEFAULT ->
                          let {
                            sat :: PrelBase.Int
                            __A 0 __C
                            sat
                              = PrelBase.$wI# a
                          } in 
                            case PrelGHC.+# wild 1 of sat1 { __DEFAULT ->
                            $wfill0 sat1 sat s2#
                            }
                          }
                          }
                          }
                  };
        } in  $wfill0 0 Test.lit w1

The accumulator (w3) is unboxed immediately before the
writeIntArray# and its next value put into a box (sat) -
only to be unboxed immediately again in the next loop
iteration.

This would make perfect sense when the definition of `foo'
were

  foo mba# n = fill mba# id plus 1000 0
    where
      plus 0 = error "Die horribly"
      plus x = x + 1

I also appreciate that, if the loop is executed zero times,
the initial value of `acc' is not demanded.  But this is not
much different to the case handled by simplCore/LiberateCase.

And indeed with a little help, GHC generates much better
code.  In the following, I rewrote `fill' to explicitly test
for input values that make the loop execute zero times:

  fill mba# f g (I# 0#) start = return start
  fill mba# f g (I# n#) start = fill0 0# start
    where
      fill0 i# acc = do
                       writeIntArray mba# (I# i#) (f acc)
                       let i'#  = i# +# 1#
                           acc' = g acc
                       if i'# ==# n# then return acc' else fill0 i'# acc'

Now, `acc' is guaranteed to be used in each invocation of
`fill0' and GHC generates:

        __letrec {
          $wfill0 :: (PrelGHC.Int#
                      -> PrelGHC.Int#
                         -> PrelGHC.State# s -> (PrelGHC.State# s, PrelBase.Int))
          __A 3 __C
          $wfill0
            = \ w2 :: PrelGHC.Int#
                ww :: PrelGHC.Int#
                w3 :: (PrelGHC.State# s) ->
                  case PrelGHC.writeIntArray# @ s w w2 ww w3 of s2# { __DEFAULT ->
                  case PrelGHC.+# w2 1 of wild {
                      1000 ->
                          case PrelGHC.+# ww 1 of a { __DEFAULT ->
                          let {
                            a1 :: PrelBase.Int
                            __A 0 __C
                            a1
                              = PrelBase.$wI# a
                          } in  (# s2#, a1 #)
                          };
                      __DEFAULT ->
                          case PrelGHC.+# ww 1 of sat { __DEFAULT -> $wfill0 wild sat 
s2# }
                  }
                  };
        } in  $wfill0 0 0 w1

A nice tight loop.

However, the initial version of `fill' is the more natural
one to write.  I think, it should be possible to derive the
second version (are at least a similar version)
automatically from the initial code.  The derivation might
go roughly as follows:

    fill mba# f g (I# n#) start = fill0 0# start
      where
        fill0 i# acc = case i# ==# n# of
                         True  -> return acc
                         False -> do
                                    writeIntArray mba# (I# i#) (f acc)
                                    fill0 (i# +# 1#) (g acc)

  === {pull case out of fill0 (ie, partial unfolding)}

    fill mba# f g (I# n#) start = 
      case 0# ==# n# of
        True  -> return start
        False -> fill0 0# start
      where
        fill0 i# acc = do
                         writeIntArray mba# (I# i#) (f acc)
                         case (i# +# 1#) ==# n# of
                           True  -> return (g acc)
                           False -> fill0 (i# +# 1#) (g acc)

This is essentially the recursive variant of a well known
law for while loops:

    while p do q;
  ===
    if p then do q while p;

Wouldn't this actually subsume the liberate case rule?

    f = \ t -> case v of
                   V a b -> a : f t

  === {pull out the case}

    f = case v of
          V a b -> f = \ t -> a : case v of
                                    V a b -> f t

  === {simplification}

    f = case v of
          V a b -> f = \ t -> a : f t

This might be more complicated to implement, as we only
partially unfold the recursive function, but it also has
more scope.

What do you think?

Cheers,
Manuel

PS: All Core code was generated with the HEAD from two days ago.

_______________________________________________
Glasgow-haskell-users mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

Reply via email to