Bulat Ziganshin wrote:

i propose to do for start rather small change that will allow to
make things go several times faster and in particular outperform jhc
for the small "leaf" loops (i.e. loops that use only GHC primitives
and don't call other functions). that include factorial, "a[]+=b[i]"
loop that was discussed here several weeks ago, some shootout
examples, my own arrays serialization code and so on, so on

the idea is that the following STG code

f :: Int# -> Double# -> ... -> Int#

(i.e. function use only unboxed values/results)

f x y z = code1 in
  case expr1 of
    A -> codeA
    B -> codeB
    C -> codeC in f x_c y_c z_c
    D -> codeD in f x_d y_d z_d
    _ -> code0 in f x0 y0 z0

can be compiled to the following C/C-- code:

f() {
  x = sp[0];
  y = sp[4];
  z = sp[8];
  sp+=12;

  code1;
  while (expr1!=A) {
    if (expr1==B) then return codeB;
    else if (expr1==C) then {x=x_c; y=y_c; z=z_c;}
    else if (expr1==D) then {x=x_d; y=y_d; z=z_d;}
    else {x=x0; y=y0; z=z0;}
    code1;
  }
  return codeA;
}

this compilation don't require any changes in GHC memory model. all we
need:

1) add for/if/while to the C-- statement types list (data CmmStmt)

Please don't extend the C-- data type - it's very small and simple because that makes it easy to manipulate and reason about.

I don't see why you need to, either: you can already express for, if and while in C-- using conditional branches. I don't think gcc cares whether you write your code using labels and goto or while/for/if, it generates the same code either way.

2) implement recognizer for such simple STG functions (leaf unboxed
procedures recognizer)

3) implement the translation itself

By all means try this. What you want is to compile recursive functions like this:

  f() {
   x = arg1;
   y = arg2;
  L:
   ... body of f, with args mapped to x & y,
       and recursive calls jumping to L passing args in x & y.
  }

It's quite hard to do this as a C-- to C-- optimisation, at least without implementing a lot of other optimisations that we don't already have. I was hoping that gcc would do it for us, if we compile code like this:

  f() {
  L:
   ... body of f ...
   goto L;
  }

but sadly it doesn't do the whole job. (see cmmLoopifyForC in cmm/CmmOpt.hs, which I added today).

So you might try the hacky way of doing this transformation at a higher level, when generating the C-- in the first place. Putting the args in temporaries is the easy bit; generating different code for the recursive call will be more tricky.

Cheers,
        Simon
_______________________________________________
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

Reply via email to