Ryan

Yes, I’m sure there is room for improvement here!   Good observations.

In particular, while I’d like DATA to support this, I believe that most ‘eval’s 
find that the thing being evaluated is already evaluated; so the fast-path 
should be that case.  We should bet for evaluated.

NB, however, that the stack-overflow check applies to the entire function body. 
 Suppose we do
   f x = case x of (p,q) -> case p of True -> False; False -> True
Then we can eliminate the stack check only if both the eval of ‘x’ and the eval 
of ‘p’ both take the fast path.

Another avenue you might like to think about is this: if function f calls g, 
then g’s stack-overflow test could perhaps be absorbed into ‘f’s.  So
   f x = case (g x) of True -> False; False -> True
So maybe f could call a super-fast entry point for g that didn’t have a stack 
overflow test!

(Watch out: loops, recursion, etc)

I’m happy to advise.

Simon


From: Ryan Newton [mailto:rrnew...@gmail.com]
Sent: 23 October 2015 19:31
To: Simon Peyton Jones
Cc: ghc-devs@haskell.org; Ömer Sinan Ağacan; Ryan Scott; Chao-Hong Chen; Johan 
Tibell
Subject: Re: Better calling conventions for strict functions (bang patterns)?

Ah, yes, so just to give a concrete example in this thread, if we take the 
`foo` function above and say `map foo ls`, we may well get unevaluated 
arguments to foo.  (And this is almost precisely the same as the first example 
that Strict-Core paper!)

Thanks for the paper reference.  I read it and it's great -- just what I was 
looking for.  An approach that eliminates any jealousy of ML/Scheme compiler 
techniques vis a vis calling conventions ;-).  I'm also wondering if there are 
some incremental steps that can be taken, short of what is proposed in the 
paper.

  1.  Small tweaks: The CMM code above seems to be betting than the thunk is 
unevaluated, because it does the stack check and stack write before the 
predicate test that checks if the thunk is evaluated (if (R1 & 7 != 0) goto 
c3aO; else goto c3aP;).  With a bang-pattern function, couldn't it make the 
opposite bet?  That is, branch on whether the thunk is evaluated first, and 
then the wasted computation is only a single correctly predicted branch (and a 
read of a tag that we need to read anyway).
  2.  The option of multiple entrypoints which is considered and discarded as 
fragile in the beginning of the paper (for direct call vs indirect / 1st order 
vs higher order).  That fragile option is along the lines of what I wanted to 
discuss on this thread.  It does seem like a tricky phase ordering concern, but 
how bad is it exactly?  The conflict with the a case-expr rewrite is 
illustrated clearly in the paper, but that just means that such optimizations 
must happen before the final choice of which function entrypoint to call, 
doesn't it?  I'm not 100% sure where it could go in the current compiler 
pipeline, but couldn't the adjustment of call target from "foo" to 
"foo_with_known_whnf_args" happen quite late?
Cheers,
  -Ryan

P.S. One of the students CC'd, Ryan Scott, is currently on internship at Intel 
labs and is working to (hopefully) liberate the Intell Haskell Research 
Compiler as open source.  Like the 2009 paper, it also uses a strict IR, and I 
think it will be interesting to see exactly how it handles the conversion from 
Core to its IR.  (Probably the same as Fig 10 in the paper.)


On Fri, Oct 23, 2015 at 10:11 AM, Simon Peyton Jones 
<simo...@microsoft.com<mailto:simo...@microsoft.com>> wrote:
It’s absolutely the case that bang patterns etc tell the caller what to do, but 
the function CANNOT ASSUME that its argument is evaluated.  Reason: higher 
order functions.

I think that the way to allow functions that can assume their arg is evaluated 
is through types: see Type are calling 
conventions<https://na01.safelinks.protection.outlook.com/?url=http:%2f%2fresearch.microsoft.com%2f~simonpj%2fpapers%2fstrict-core%2ftacc-hs09.pdf&data=01%7C01%7Csimonpj%40064d.mgd.microsoft.com%7C6c26443b080f47ded44d08d2dbd82785%7C72f988bf86f141af91ab2d7cd011db47%7C1&sdata=apsJhmnvf2Y0M2KhNcmPzg%2bqkd5zYl2MmDYvOcCObJA%3d>.
  But it’d be a fairly big deal to implement.

Simon


From: ghc-devs 
[mailto:ghc-devs-boun...@haskell.org<mailto:ghc-devs-boun...@haskell.org>] On 
Behalf Of Ryan Newton
Sent: 23 October 2015 14:54
To: ghc-devs@haskell.org<mailto:ghc-devs@haskell.org>; Ömer Sinan Ağacan; Ryan 
Scott; Chao-Hong Chen; Johan Tibell
Subject: Better calling conventions for strict functions (bang patterns)?

Hi all,

With module-level Strict and StrictData pragmas coming soon, one obvious 
question is what kind of the code quality GHC can achieve for strict programs.

When it came up in discussion in our research group we realized we didn't 
actually know whether the bang patterns, `f !x`, on function arguments were 
enforced by caller or callee.

Here's a Gist that shows the compilation of a trivial function:


foo :: Maybe Int -> Int


foo !x =


  case x of


   Just y -> y


   
https://gist.github.com/rrnewton/1ac722189c65f26fe9ac<https://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fgist.github.com%2frrnewton%2f1ac722189c65f26fe9ac&data=01%7c01%7csimonpj%40064d.mgd.microsoft.com%7cb006dcdbfe834ebb6c1e08d2dbb16c03%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=qxrT8r1VSP97xQUF2qqkLlxEtSGi9VOzfmORl25W%2fWY%3d>

If that function is compiled to *assume* its input is in WHNF, it should be 
just as efficient as the isomorphic MLton/OCaml code, right?  It only needs to 
branch on the tag, do a field dereference, and return.

But as you can see from the STG and CMM generated, foo does indeed enter the 
thunk, adding an extra indirect jump.  Here's the body:


      c3aY:


          if ((Sp + -8) < SpLim) goto c3aZ; else goto c3b0;


      c3aZ:


          // nop


          R1 = PicBaseReg + foo_closure;


          call (I64[BaseReg - 8])(R2, R1) args: 8, res: 0, upd: 8;


      c3b0:


          I64[Sp - 8] = PicBaseReg + block_c3aO_info;


          R1 = R2;


          Sp = Sp - 8;


          if (R1 & 7 != 0) goto c3aO; else goto c3aP;


      c3aP:


          call (I64[R1])(R1) returns to c3aO, args: 8, res: 8, upd: 8;


      c3aO:


          if (R1 & 7 >= 2) goto c3aW; else goto c3aX;


      c3aW:


          R1 = P64[R1 + 6] & (-8);


          Sp = Sp + 8;


          call (I64[R1])(R1) args: 8, res: 0, upd: 8;


      c3aX:


          R1 = PicBaseReg + lvl_r39S_closure;


          Sp = Sp + 8;


          call (I64[R1])(R1) args: 8, res: 0, upd: 8;



The call inside c3aP is entering "x" as a thunk, which also incurs all of the 
stack limit check code.  I believe that IF the input could be assumed to be in 
WHNF, everything above the label "c3aO" could be omitted.

So... if GHC is going to be a fabulous pure and imperative language, and a 
fabulous lazy and strict compiler/runtime.. is there some work we can do here 
to improve this situation? Would the following make sense:

  *   Put together a benchmark suite of all-strict programs with 
Strict/StrictData (compare a few benchmark's generated code to MLton, if time 
allows)
  *   Modify GHC to change calling conventions for bang patterns -- caller 
enforces WHNF rather than callee.  Existing strictness/demand/cardinality 
analysis would stay the same.
Unless there's something I'm really missing here, the result should be that you 
can have a whole chain of strict function calls, each of which knows its 
arguments and the arguments it passes to its callees are all in WHNF, without 
ever generating thunk-entry sequences.

Thanks for your time,
  -Ryan


_______________________________________________
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Reply via email to