RE: Questions concerning the LLVM backend - i.e. 'proc point splitting'

Simon Peyton Jones Fri, 20 Nov 2015 09:49:00 -0800

David

All this stuff is terribly paged-out for me.  But I'd love someone to pay 
attention to some of this back-end stuff, so if you'd like to work there, I'd 
like to help you.


David Terei was also thinking of getting rid of proc point splitting for LLVM 
(see attached email and the notes attached to it)

Simon Marlow also was thinking about this; see his attached email.

The main *reason* for establishing a proc-point is so that, when generating C 
code (which we don't do any more) you can take its address.  E.g.

foo() {
  ...
  Push &bar onto the stack (effectively a return address)
  Jump to thingumpy
}

bar() {
  ...
}

Really bar is part of foo, but we have make it a separate top level thing so 
that we can take the address of bar and push it on the stack.

The trouble is that if there is a label inside bar that foo wants to jump to 
(i.e. without pushing &bar etc) then we have to make that label too into a 
proc-point, so that both foo and bar can see it.  Urgh.

In LLVM this probably is not true; presumably you can take the address of any 
label?

Avoiding proc-point splitting would be GREAT.

Simon


|  -----Original Message-----
|  From: ghc-devs [mailto:ghc-devs-boun...@haskell.org] On Behalf Of
|  David Spitzenberg
|  Sent: 12 November 2015 16:25
|  To: ghc-devs@haskell.org
|  Subject: Questions concerning the LLVM backend - i.e. 'proc point
|  splitting'
|  
|  Hello everybody,
|  
|  I've got some questions concerning the LLVM backend - i.e. the pass
|  'proc point splitting' in the cmm-pipeline.
|  
|  I'm new to this list as well as to computer science in general - at
|  least in some sense. Being a third semester student of computer
|  science, I'm not quite sure, whether I'm addressing the right list
|  with this mail. Feel free to correct me, if not.
|  
|  I'm doing numerical stencil calculations which heavily depend on
|  nested loops expressed as tail-recursive functions nested one in
|  another. When compiling a module, GHC seems capable of emitting pretty
|  good cmm-code.
|  At least as long as the loop bodies do not contain any call to non-
|  inlineable functions. In most cases 'loopification' fires and
|  transforms tail-recursive calls into loops.
|  
|  When calling into non-inlineable functions, 'proc point splitting' is
|  performed. This splits the loops into several mutually-recursive
|  functions therefore significantly decreasing the possibilities of
|  'opt'
|  in optimizing the code later on. This leads to pretty bad performance
|  of the whole calculation.
|  
|  To avoid the presence of proc points within a loop body - and by doing
|  so the splitting of the body - I currently try to avoid calls into
|  non-inlineable functions by merging the nested recursive functions
|  into one giant tail-recursive function. This works, as long as
|  everything not defined in the module being compiled can be inlined.
|  Apparently, this is not a long term solution.
|  
|  
|  
|  My questions:
|  
|   - The reason to perform 'proc point splitting' is, that LLVM IR lacks
|  of the possibility to call into an arbitrary basic block of a function
|  body from outside the function itself. Is this understanding right?
|  
|   - Are there any other possibilities to avoid 'proc point splitting'
|  when using LLVM as a backend?
|  
|   - I recently read the article on an improved LLVM backend in the GHC
|  wiki (https://ghc.haskell.org/trac/ghc/wiki/ImprovedLLVMBackend), but
|  couldn't find any suggestions on the topic of 'proc point splitting'.
|  Are there any plans to solve the problem either by teaching LLVM IR
|  the lacking functionality or in any other way?
|  
|  
|  
|  I'd really like to contribute to GHC by working on this problem, but
|  I'm not quite sure, if I'm capable of doing so. My background
|  knowledge is still somewhat limited, but I'm working on it ;-)
|  
|  Thanks in advance for your help!
|  
|  Regards,
|  
|  David Spitzenberg
|  
|  PS.: Thanks to everyone for the great work you're doing on GHC!
|  _______________________________________________
|  ghc-devs mailing list
|  ghc-devs@haskell.org
|  https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2fmail.h
|  askell.org%2fcgi-bin%2fmailman%2flistinfo%2fghc-
|  devs&data=01%7c01%7csimonpj%40064d.mgd.microsoft.com%7cfe71402ab68e40b
|  9777408d2eb7ddf90%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=Bqkhoth
|  1Ui6FRUwEbfHajdjx7g9bSrM25geCCDREdus%3d

--- Begin Message ---

Hmmm my brain has mostly emptied of this stuff but I think there were
two main issues I was looking at:

 - Call conventions use stack passing too much. There was some issue
where the new code gen would basically unify any entry points to
functions by passing the args on the stack. So you'd get some really
bad code where lots of function had extra entry point functions for
them that would copy from the registers to the stack to call the main
function (kind of like slow entry points but unnecessary ones).
Sometimes you'd get really bad call graphs were arguments would be
copied from stack to registers and back to stack or vice-versa. I
think we fixed up some of these issues by implementing the specialised
GC calls but the underlying problem with how calling conventions are
handled remains.

(the above description is probably pretty wrong as its from memory but
should be enough to point Edward in the right direction).

- Death to proc point splitting. We also discussed and I briefly
looked at (but never really had time to make progress on) pushing the
proc point splitting pass down the chain so that it was only done for
the C backend. The NCG would then have much larger functions that
would hopefully be nicer to optimise. What the LLVM backend would do
here is a little tricky. It has basically the same limitations as the
C backend so kind of needs proc point splitting. However the main
benefit of removing proc point splitting is to give LLVM larger
optimisation opportunities so unless a good way alternate way of
handling proc points for the LLVM backend is found I don't know how
useful this work would be. The NCG would probably get no performance
benefit.

I've attached some notes I wrote up while working/thinking about the
proc point splitting stuff. Always been meaning to add them to the
wiki, they're mostly complete and probably mostly already stuff you
know Edward but explain proc point splitting and why it'd be nice to
remove.

One thing in the backend I've been wanting to do but have no time is
to carry around the live stg register information in the Cmm
representation. If I knew when generating LLVM code which stg
registers were live on function entry, function exit and across calls
then I think I could generate significantly better code in many
situation. At some point we should defiantly do this.

Cheers,
David.

On 18 March 2011 04:05, Simon Peyton-Jones <simo...@microsoft.com> wrote:
> Edward,
>
> There are three things we need before switching to the new codegen pipeline:
> a) compiled programs work
> b) they run as fast as with the old codegen pipeline
> c) the compiler runs as fast (or nearly so)
>
> I think you have more or less cracked (a).  You are now thinking about (c).  
> But (b) is important too, perhaps more so than (c).  Just compare the 
> ultimate C-- produced by the old pipeline with that produced by the new one.  
> David Terei did some of this and found some egregious worsening.  Fixing 
> these will make the new path generate better code.
>
> David: did you put all your work into the tree, or do you have leftover 
> patches?   Do you have any pointers for what you planned to tackle next?
>
> Simon
>
> |  -----Original Message-----
> |  From: Edward Z. Yang [mailto:ezy...@mit.edu]
> |  Sent: 17 March 2011 17:54
> |  To: Simon Peyton-Jones
> |  Cc: Simon Marlow
> |  Subject: RE: Hoopl profiling
> |
> |  Thanks!
> |
> |  My current plan of attack is to reduce some of the performance pains seen
> |  in real GHC code into something that I can put into the Hoopl test suite:
> |  right now arfGraph doesn't even show up in the top cost-centers, and it 
> looks
> |  like it's a very specific kind of code that tickles this behavior.
> |
> |  Cheers,
> |  Edward
> |
> |  Excerpts from Simon Peyton-Jones's message of Wed Mar 16 12:38:21 -0400 
> 2011:
> |  > You're more than welcome to hang out here -- but I won't have many 
> cycles that's
> |  all.
> |  >
> |  > S
> |  >
> |  > | -----Original Message-----
> |  > | From: Edward Z. Yang [mailto:ezy...@mit.edu]
> |  > | Sent: 16 March 2011 15:40
> |  > | To: Simon Peyton-Jones
> |  > | Cc: Simon Marlow
> |  > | Subject: RE: Hoopl profiling
> |  > |
> |  > | Sure. I can also continue working quasi-independently (as I've been 
> doing
> |  > | during
> |  > | term), which hopefully doesn't take up too much time.
> |  > |
> |  > | Edward
> |  > |
> |  > | Excerpts from Simon Peyton-Jones's message of Wed Mar 16 06:24:40 
> -0400 2011:
> |  > | > URGH. that's a maximally busy week -- ICFP deadline on 24th, many
> |  > | interviews.  Is there any possibiliity of deferring?
> |  > | >
> |  > | > Simon will be less tied up than me though
> |  > | >
> |  > | > S
> |  > | >
> |  > | > | -----Original Message-----
> |  > | > | From: Edward Z. Yang [mailto:ezy...@mit.edu]
> |  > | > | Sent: 16 March 2011 09:47
> |  > | > | To: Simon Peyton-Jones
> |  > | > | Cc: Simon Marlow
> |  > | > | Subject: RE: Hoopl profiling
> |  > | > |
> |  > | > | Excerpts from Simon Peyton-Jones's message of Wed Mar 16 05:24:04 
> -0400
> |  > | 2011:
> |  > | > | > Can you remind me when you plan to be over here during the vac?  
> I'm
> |  > | > | totally
> |  > | > | > maxed out with ICFP submissions, and will be until 25th.  Then 
> things
> |  > | ease
> |  > | > | > off -- slightly!
> |  > | > |
> |  > | > | I'm planning on coming in next week (21st). Does that work?
> |  > | > |
> |  > | > | Edward
>
>

cmm-ncg-notes
Description: cmm-ncg-notes

--- End Message ---

--- Begin Message ---

I've been thinking a little about this.  If some extension to LLVM is needed to 
support TNTC, then I think it might be better to go all the way and support 
arbitrary labels with info tables, not just top-level procedures.  Also, it has 
to be possible to take the address of a label.  Obviously, such labels cannot 
be optimised away, and there has to be enough information in the intermediate 
code to be able to construct control-flow graphs.

The reason to do this is that it will let us generate much better code from 
GHC.  In the absence of support for labels with info tables, the code generator 
has to do something called "proc point analysis" and split up procedures in 
order to guarantee that (a) only top-level procs have info tables, and (b) we 
never jump into the middle of a proc.  Every time we split up a proc, 
everything that is live in local variables has to be saved or passed to the 
other procedure explicitly, which adds overhead.  Furthermore, to generate code 
that doesn't do this a lot, the Stg-to-Cmm code generator has to be "proc-point 
aware" and avoid generating code that requires too many proc-point splits (I've 
been doing quite a lot of this in the new codegen recently).

Even simple loops will suffer from this: e.g. in the new code generator a 
recursive let-no-escape will compile directly into a loop, that is unless it 
contains a heap check or stack check or even just a call to another function - 
any of these will force the let-no-escape to be a proc point, forcing all its 
parameters to be explicitly saved/restored on every iteration.

We can make the NCG support arbitrary labels with info tables quite easily, and 
I'm very tempted to do it because it will let us generate much better code.  
But the LLVM route will not be able to benefit, unless we can find a way to do 
this with LLVM too.

Cheers,
        Simon

> -----Original Message-----
> From: David Terei [mailto:davidte...@gmail.com]
> Sent: 18 March 2012 23:39
> To: Sergiu Ivanov
> Cc: Simon Marlow; Simon Marlow; cvs-...@haskell.org
> Subject: Re: LLVM optimisation passes / tables next to code
>
> Sounds fine. However it may be best to start with the native code
> generator (NCG). It will also need to be changed to make sure the backends
> are all compatible. It probably is easier to work with to implement the
> proposed TNTC alternative. Can then test this alternative works and with
> good performance.
>
> Cheers,
> David
>
> On 17 March 2012 13:10, Sergiu Ivanov <unlimitedscol...@gmail.com> wrote:
> > Hello,
> >
> > I've seen Chris Lattner of LLVM voice in the favour of adding blobs of
> > inline assembler at the start of functions, which sounds like good
> > news!
> >
> > I'll be now (quickly) reading http://llvm.org/docs/tutorial/ to get an
> > overall picture and I will also try to fix a bug to show that I'm
> > actually capable of coding.
> >
> > Does this plan sound good?
> >
> > Sergiu
> >
> > P.S. Unfortunately, my time is quite limited now, since the university
> > consumes quite a lot of my effort.  I beg my pardon for my
> > sluggishness in advance :-(

_______________________________________________
Cvs-ghc mailing list
cvs-...@haskell.org
http://www.haskell.org/mailman/listinfo/cvs-ghc

--- End Message ---

_______________________________________________
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

RE: Questions concerning the LLVM backend - i.e. 'proc point splitting'

Reply via email to