Re: The Curious Case of T6084 -or- Register Confusion with LLVM

2017-09-21 Thread Moritz Angermann
The issue is at the function definition. In the price point splitting code we 
determine that the F1 and D2 registers are not actually used in the body of 
`q`.  And as such optimize the set of live register tees from R1, F1, D2, F3, 
D4 to R1, F3, D4.

Thus in https://phabricator.haskell.org/D4003
I simply retain the live registers of the top proc instead of updating them to 
the optimized set.

As such we generate the correct function signature in the llvm backend.

Sent from my iPhone

> On 22 Sep 2017, at 2:08 AM, Kavon Farvardin  wrote:
> 
> Let me elaborate a bit more because I clearly missed some points you already 
> made in your original message. Sorry about that:
> 
> 
> I don't think we need a heavyweight solution to this problem (the suggestions 
> of: disabling overlapping registers for LLVM, or adding a new virtual 
> register class Vx).
> 
> Instead, let's first remember how the type of the called function pointer 
> corresponds to its calling convention when it is lowered to assembly in LLVM. 
> In our GHC calling convention in LLVM, we can specify that
> 
>if type == float OR type == double, use:
>XMM1,XMM2,XMM3,XMM4,XMM5,XMM6
> 
> When a calling convention is being determined by LLVM for any function 
> definition or call, it goes in order from left to right in the list of 
> parameters, and assigns float or double arguments to the first currently 
> available register in that XMM list.
> 
> So, if `q` were indeed using F3 and D4 to accept its first two floating point 
> arguments, the function signature we generate,
> 
>ghccc void @q(i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, 
> double)
> 
> is wrong. The registers for the `float, double` arguments will be assigned to 
> XMM1 and XMM2 by LLVM. Since F3 and D4 use XMM3 and XMM4, respectively, we 
> should have padded out the type of `q` in LLVM to be:
> 
> 
>ghccc void @q(i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, 
> double, float, double)
> 
> where the first `float, double` parameters are now unused. We would also 
> perform the same type of padding at every call site where the first two float 
> arguments are F3 and D4, so that they end up in the right physical registers.
> We pass `undef` for the first two `float, double` arguments.
> 
> 
> 
>> On Sep 21, 2017, at 12:32 PM, Kavon Farvardin  wrote:
>> 
>> Responses are inline below:
>> 
>>> As the LLVM backend takes off from Cmm, we produce function that always hold
>>> the always live registers (on x86_64 these are: Base, Sp, Hp, R1, R2, R3, 
>>> R4, R5, R6, SpLim)
>>> and appends those registers that are live throughout the function call: in 
>>> the
>>> case of `q` this is one Float and one Double register.
>> 
>> To be more precise, we append only the live floating point or vector 
>> arguments to this always live list. We need to do this because of 
>> overlapping register usage in our calling convention on x86-64 (F1 and D1 
>> are both put in XMM1). See Note [Overlapping global registers] for details.
>> 
>> 
>>> Let’s assume these are F3 and D4.  Thus the function signature we generate 
>>> looks like:
>>> 
>>> ghccc void @q(i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, 
>>> double)
>>> 
>>> And expect the passed arguments to represent the following registers:
>>> 
>>>base, sp, hp, r1, r2, r3, r4, r5, r6, spLim, f3, d4
>>> 
>>> as we found that f1 and d1 are not live.
>> 
>> 
>> I think it's wrong to assume that `q` accepts its first two floating-point 
>> arguments in F3 and D4, because I'm pretty sure the standard Cmm calling 
>> convention assigns them to F1 and D2, respectively. Are we actually 
>> outputting `q` such that F3 and D4 are used?
>> 
>> 
>>> (This is where my llvmng backend fell over, as it does not bitcast function
>>> signatures but tries to unify them.)
>> 
>> 
>> I think to solve this problem, we'll want to bitcast functions whenever we 
>> call them, because the type of an LLVM function is important for us to get 
>> the calling convention correct.
>> 
>> 
>> ~kavon
>> 
>> 
>>> On Sep 20, 2017, at 4:44 AM, Moritz Angermann  
>>> wrote:
>>> 
>>> Hi *,
>>> 
>>> TLDR: The LLVM backend might confuse floating registers in GHC.
>>> 
>>> # Demo (Ticket #14251)
>>> 
>>> Let Demo.hs be the following short program (a minor modification from 
>>> T6084):
>>> ```
>>> {-# LANGUAGE MagicHash, BangPatterns #-}
>>> module Main where
>>> 
>>> import GHC.Exts
>>> 
>>> {-# NOINLINE f #-}
>>> f :: (Int# -> Float# -> Double# -> Float# -> Double# -> String) -> String
>>> f g = g 3# 4.0# 5.0## 6.0# 6.9## ++ " World!"
>>> 
>>> {-# NOINLINE q #-}
>>> q :: Int# -> Float# -> Double# -> Float# -> Double# -> String
>>> q i j k l m = "Hello " ++ show (F# l) ++ " " ++ show (D# m)
>>> 
>>> main = putStrLn (f $ q)
>>> ```
>>> 
>>> What happens if we compile them with the NCG and LLVM?
>>> 
>>> $ ghc -fasm -fforce-recomp Demo.hs -O2 -o Demo-ncg 

Re: Determine instance method from class method callsite

2017-09-21 Thread Robin Palotai
My conclusion so far: there's no royal way. One can get the instance
dictionary DFunId pretty easy, but then access to the Typechecked AST of
the instance declaration is really needed to find all the method bindings
$c... (that get applied when constructing the dictionary $d...).

2017-09-19 7:38 GMT+02:00 Robin Palotai :

> Sorry, I messed up subject and mailing list. Copying to both list now
> after the mistake (wanted only ghc-devs for specificity).
>
> Thanks!
>
> 2017-09-19 7:36 GMT+02:00 Robin Palotai :
>
>> Hello GHC devs,
>>
>> Before inventing the wheel, want to check if there is a GHC API way to
>> look up the (fully) resolved instance method from a class method.
>>
>> For example, given a code
>>
>> data Foo Int deriving Show
>>
>> bar = show (Foo 3)
>>
>> when inspecting the Typechecked AST for bar's show call, I would like to
>> get to the Name / Id of 'show' of the 'Show' typeclass.
>>
>> I believe I could use splitHsSigmaTy on the HsType of the function call
>> to get the context, and then evaluate the HsWrapper somehow to find out
>> what instance dictionary is applied to the class restriction in the
>> context, and then look up the instance method from the dictionary..
>>
>> Two questions:
>>
>> 1) Is there maybe functionality for this?
>>
>> 2) If not, is there any guarantee about the constraint order in the
>> context, at the method call? So I could more easily determine which
>> constraint's application to look for..
>>
>> Any hints welcome && Thank you!
>> Robin
>>
>
>
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: The Curious Case of T6084 -or- Register Confusion with LLVM

2017-09-21 Thread Kavon Farvardin
Let me elaborate a bit more because I clearly missed some points you already 
made in your original message. Sorry about that:


I don't think we need a heavyweight solution to this problem (the suggestions 
of: disabling overlapping registers for LLVM, or adding a new virtual register 
class Vx).

Instead, let's first remember how the type of the called function pointer 
corresponds to its calling convention when it is lowered to assembly in LLVM. 
In our GHC calling convention in LLVM, we can specify that

if type == float OR type == double, use:
XMM1,XMM2,XMM3,XMM4,XMM5,XMM6

When a calling convention is being determined by LLVM for any function 
definition or call, it goes in order from left to right in the list of 
parameters, and assigns float or double arguments to the first currently 
available register in that XMM list.

So, if `q` were indeed using F3 and D4 to accept its first two floating point 
arguments, the function signature we generate,

ghccc void @q(i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, 
double)

is wrong. The registers for the `float, double` arguments will be assigned to 
XMM1 and XMM2 by LLVM. Since F3 and D4 use XMM3 and XMM4, respectively, we 
should have padded out the type of `q` in LLVM to be:


ghccc void @q(i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, 
double, float, double)

where the first `float, double` parameters are now unused. We would also 
perform the same type of padding at every call site where the first two float 
arguments are F3 and D4, so that they end up in the right physical registers.
We pass `undef` for the first two `float, double` arguments.



> On Sep 21, 2017, at 12:32 PM, Kavon Farvardin  wrote:
> 
> Responses are inline below:
> 
>> As the LLVM backend takes off from Cmm, we produce function that always hold
>> the always live registers (on x86_64 these are: Base, Sp, Hp, R1, R2, R3, 
>> R4, R5, R6, SpLim)
>> and appends those registers that are live throughout the function call: in 
>> the
>> case of `q` this is one Float and one Double register.
> 
> To be more precise, we append only the live floating point or vector 
> arguments to this always live list. We need to do this because of overlapping 
> register usage in our calling convention on x86-64 (F1 and D1 are both put in 
> XMM1). See Note [Overlapping global registers] for details.
> 
> 
>> Let’s assume these are F3 and D4.  Thus the function signature we generate 
>> looks like:
>> 
>> ghccc void @q(i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, 
>> double)
>> 
>> And expect the passed arguments to represent the following registers:
>> 
>> base, sp, hp, r1, r2, r3, r4, r5, r6, spLim, f3, d4
>> 
>> as we found that f1 and d1 are not live.
> 
> 
> I think it's wrong to assume that `q` accepts its first two floating-point 
> arguments in F3 and D4, because I'm pretty sure the standard Cmm calling 
> convention assigns them to F1 and D2, respectively. Are we actually 
> outputting `q` such that F3 and D4 are used?
> 
> 
>> (This is where my llvmng backend fell over, as it does not bitcast function
>> signatures but tries to unify them.)
> 
> 
> I think to solve this problem, we'll want to bitcast functions whenever we 
> call them, because the type of an LLVM function is important for us to get 
> the calling convention correct.
> 
> 
> ~kavon
> 
> 
>> On Sep 20, 2017, at 4:44 AM, Moritz Angermann  
>> wrote:
>> 
>> Hi *,
>> 
>> TLDR: The LLVM backend might confuse floating registers in GHC.
>> 
>> # Demo (Ticket #14251)
>> 
>> Let Demo.hs be the following short program (a minor modification from T6084):
>> ```
>> {-# LANGUAGE MagicHash, BangPatterns #-}
>> module Main where
>> 
>> import GHC.Exts
>> 
>> {-# NOINLINE f #-}
>> f :: (Int# -> Float# -> Double# -> Float# -> Double# -> String) -> String
>> f g = g 3# 4.0# 5.0## 6.0# 6.9## ++ " World!"
>> 
>> {-# NOINLINE q #-}
>> q :: Int# -> Float# -> Double# -> Float# -> Double# -> String
>> q i j k l m = "Hello " ++ show (F# l) ++ " " ++ show (D# m)
>> 
>> main = putStrLn (f $ q)
>> ```
>> 
>> What happens if we compile them with the NCG and LLVM?
>> 
>> $ ghc -fasm -fforce-recomp Demo.hs -O2 -o Demo-ncg && ./Demo-ncg
>> Hello 6.0 6.9 World!
>> 
>> $ ghc -fllvm -fforce-recomp Demo.hs -O2 -o Demo-llvm && ./Demo-llvm
>> Hello 4.0 5.0 World!
>> 
>> # Discussion
>> 
>> What is happening here?  The LLVM backend passes the registers in arguments,
>> which are then mapped to registers via the GHC calling convention we added
>> to LLVM.
>> 
>> As the LLVM backend takes off from Cmm, we produce function that always hold
>> the always live registers (on x86_64 these are: Base, Sp, Hp, R1, R2, R3, 
>> R4, R5, R6, SpLim)
>> and appends those registers that are live throughout the function call: in 
>> the
>> case of `q` this is one Float and one Double register. Let’s assume these are
>> F3 and D4.  Thus the function signature we 

Re: The Curious Case of T6084 -or- Register Confusion with LLVM

2017-09-21 Thread Kavon Farvardin
Responses are inline below:

> As the LLVM backend takes off from Cmm, we produce function that always hold
> the always live registers (on x86_64 these are: Base, Sp, Hp, R1, R2, R3, R4, 
> R5, R6, SpLim)
> and appends those registers that are live throughout the function call: in the
> case of `q` this is one Float and one Double register.

To be more precise, we append only the live floating point or vector arguments 
to this always live list. We need to do this because of overlapping register 
usage in our calling convention on x86-64 (F1 and D1 are both put in XMM1). See 
Note [Overlapping global registers] for details.


> Let’s assume these are F3 and D4.  Thus the function signature we generate 
> looks like:
> 
> ghccc void @q(i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, 
> double)
> 
> And expect the passed arguments to represent the following registers:
> 
>  base, sp, hp, r1, r2, r3, r4, r5, r6, spLim, f3, d4
> 
> as we found that f1 and d1 are not live.


I think it's wrong to assume that `q` accepts its first two floating-point 
arguments in F3 and D4, because I'm pretty sure the standard Cmm calling 
convention assigns them to F1 and D2, respectively. Are we actually outputting 
`q` such that F3 and D4 are used?


> (This is where my llvmng backend fell over, as it does not bitcast function
> signatures but tries to unify them.)


I think to solve this problem, we'll want to bitcast functions whenever we call 
them, because the type of an LLVM function is important for us to get the 
calling convention correct.


~kavon


> On Sep 20, 2017, at 4:44 AM, Moritz Angermann  
> wrote:
> 
> Hi *,
> 
> TLDR: The LLVM backend might confuse floating registers in GHC.
> 
> # Demo (Ticket #14251)
> 
> Let Demo.hs be the following short program (a minor modification from T6084):
> ```
> {-# LANGUAGE MagicHash, BangPatterns #-}
> module Main where
> 
> import GHC.Exts
> 
> {-# NOINLINE f #-}
> f :: (Int# -> Float# -> Double# -> Float# -> Double# -> String) -> String
> f g = g 3# 4.0# 5.0## 6.0# 6.9## ++ " World!"
> 
> {-# NOINLINE q #-}
> q :: Int# -> Float# -> Double# -> Float# -> Double# -> String
> q i j k l m = "Hello " ++ show (F# l) ++ " " ++ show (D# m)
> 
> main = putStrLn (f $ q)
> ```
> 
> What happens if we compile them with the NCG and LLVM?
> 
> $ ghc -fasm -fforce-recomp Demo.hs -O2 -o Demo-ncg && ./Demo-ncg
> Hello 6.0 6.9 World!
> 
> $ ghc -fllvm -fforce-recomp Demo.hs -O2 -o Demo-llvm && ./Demo-llvm
> Hello 4.0 5.0 World!
> 
> # Discussion
> 
> What is happening here?  The LLVM backend passes the registers in arguments,
> which are then mapped to registers via the GHC calling convention we added
> to LLVM.
> 
> As the LLVM backend takes off from Cmm, we produce function that always hold
> the always live registers (on x86_64 these are: Base, Sp, Hp, R1, R2, R3, R4, 
> R5, R6, SpLim)
> and appends those registers that are live throughout the function call: in the
> case of `q` this is one Float and one Double register. Let’s assume these are
> F3 and D4.  Thus the function signature we generate looks like:
> 
> ghccc void @q(i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, 
> double)
> 
> And expect the passed arguments to represent the following registers:
> 
>  base, sp, hp, r1, r2, r3, r4, r5, r6, spLim, f3, d4
> 
> as we found that f1 and d1 are not live.
> 
> Yet, when we call `q` in the form of `g` in the body of `f`. We will pass it 
> 14 arguments
> instead of 12.  To make this “typecheck” in LLVM, we
> 
> @q' = bitcast @q to (i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, 
> float, double, float, double)
> 
> and call @q’(base, sp, hp, r1, r2, r3, r4, r5, r6, spLim, f1, d2, f3, d4).
> 
> at this point, we now assign f3 <- f1 and d4 <- d2; while silently ignoring
> the passed arguments f3 and d4.
> 
> (This is where my llvmng backend fell over, as it does not bitcast function
> signatures but tries to unify them.)
> 
> # Solution?
> 
> Initially, Ben and I though we could simply always pass all registers as
> arguments in LLVM and call it a day with the downside of create more verbose
> but correct code.  As I found out, that comes with a few complications.  For
> some reason, all active stg registers for my machine give me 
> 
>  Base, Sp, Hp, R1, R2, R3, R4, R5, R6, SpLim,
>  F1, D1, F2, D2, F3, D3,
>  XMM1,XMM2,XMM3,XMM4,XMM5,XMM6,
>  YMM1,YMM2,YMM3,YMM4,YMM5,YMM6,
>  ZMM1,ZMM2,ZMM3,ZMM4,ZMM5,ZMM6
> 
> I should not have the YMM*, and ZMM* registers as I don’t have any AVX nor 
> AVX512;
> that looks like only a patch away.  However we try to optimize our register, 
> such
> that we can pass up to six doubles or six floats or any combination of both 
> if needed
> in registers, without having to allocate them on the stack, by assuming 
> overlapping
> registers (See Note [Overlapping global registers]).
> 
> And as such a full function signature in LLVM would as opposed to one that’s 
> based on

CircleCI (Was: Disable Travis?)

2017-09-21 Thread Joachim Breitner
Hi,

Am Donnerstag, den 21.09.2017, 14:25 +0200 schrieb Boespflug, Mathieu:
> It took me no more than a couple hours to get this working, but using
> CircleCI, for our fork of GHC. I started from Joachim's TravisCI
> script.
> 
> https://circleci.com/gh/tweag/ghc/tree/circleci
> 
> It would be trivial to activate this for github.com/ghc/ghc as well.
> 
> A few notes:
> - It runs ./validate --fast in 40 minutes.
> - CircleCI has OS X support as well. I think we should just migrate
> to
> using CircleCI for OS X testing instead of the custom drones, one or
> all of which are currently down.
> - CircleCI graciously agreed to running on one of the beefy AWS node
> types, called c4.xlarge (8 cores). On the standard node type (2
> cores), validate takes just over an hour to run. It would be great if
> ./validate could scale better to more cores.

nice! Yes, let’s do this. More CI never hurts (if someone keeps an eye
on it and fixes breakage that is not due to the code).

Can you configure circleci to mail both the committeer and a specific
person (e.g. you, or me) on every failed committ?

I enabled it now for ghc/ghc, but it says

> Configurable resource class is not enabled in your project. Please 
> contact your CSM person or our support team to whitelist your 
> project.

Can you do that?

Greetings,
Joachim


-- 
Joachim Breitner
  m...@joachim-breitner.de
  http://www.joachim-breitner.de/


signature.asc
Description: This is a digitally signed message part
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Disabling Travis?

2017-09-21 Thread Boespflug, Mathieu
It took me no more than a couple hours to get this working, but using
CircleCI, for our fork of GHC. I started from Joachim's TravisCI
script.

https://circleci.com/gh/tweag/ghc/tree/circleci

It would be trivial to activate this for github.com/ghc/ghc as well.

A few notes:
- It runs ./validate --fast in 40 minutes.
- CircleCI has OS X support as well. I think we should just migrate to
using CircleCI for OS X testing instead of the custom drones, one or
all of which are currently down.
- CircleCI graciously agreed to running on one of the beefy AWS node
types, called c4.xlarge (8 cores). On the standard node type (2
cores), validate takes just over an hour to run. It would be great if
./validate could scale better to more cores.
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


RE: Invariants about UnivCo?

2017-09-21 Thread Simon Peyton Jones via ghc-devs
Some thoughts


  *   Read Note [Coercion holes] in TyCoRep.



  *   As you’ll see, generally we don’t create value-bindings for (unboxed) 
coercions of type t1 ~# t2.  (yes for boxed ones t1 ~ t2). Reasons in the 
Note.  Exception: for superclasses of Givens we do create(co :: a ~# b) = 
sc_sel1 d

where d is some dictionary with a superclass of type (a ~# b).

Side note: the use of “cobox” is wildly unhelpful.  These Ids are specifically 
unboxed!  I’m going to change it to just “co”.


  *   You appear to have bindings like[G]  cobox_a67J = CO Sym cobox_a654.  
That is suspicious.  Who is creating them?  It may not actually be wrong but 
it’s suspicious.  The time it’d be outright wrong is if you dropped the 
ev-binds on the floor.

Ha!  runTcSEqualites makes up an ev_binds_var, and solves the equalities – but 
it should be the case that no value bindings end up in the ev_binds_var.  
(reason: we are solving equalities in a type signature, so there is no place to 
put the evidence bindigns)   I suggest you add a DEBUG-only assertion to check 
this.


  *   Do -ddump-tc -fprint-typechecker-elaboration; that should show you the 
evidence binds.

Can I ask you a favour?  Separately from your branch, can you start a branch of 
small patches to GHC that include

  *   Extra assertions, such as that above
  *   Notes that explain things you wish you’d known earlier, with references 
to those Notes from the places you were studying when you that information 
would have been useful

Richard and I know too much! – your learning curve is very valuable and I don’t 
want to lose it.

Keeping this separate from your branch is useful : you can commit (via Phab) 
these updates right away, so they aren’t predicated on adding row types to GHC.

Simon

From: ghc-devs [mailto:ghc-devs-boun...@haskell.org] On Behalf Of Nicolas Frisby
Sent: 19 September 2017 16:51
To: ghc-devs@haskell.org
Subject: Invariants about UnivCo?

[I summarize with some direct questions at the bottom of this email.]

I spent time last night trying to eliminate -dcore-lint errors from my record 
and variant library using the coxswain row types plugin. I made some progress, 
but I'm currently stuck, as discussed on this github Issue.

https://github.com/nfrisby/coxswain/issues/3#issuecomment-330577609

Here's the relevant bit:

The latest unresolved -dcore-lint error is an out-of-scope cobox co var. I'm 
certainly not creating it directly (there are no U(plugin:coxswain,... in the 
Core Lint warning), but I have to wonder if my somewhat loose use of UnivCo is 
violating some assumptions somewhere that's causing GHC to drop the co var 
binding or overlook this occurrence of it on a renaming/subst pass. I checked 
UnivCo for source comments looking for anything it should not be used for, but 
I didn't find an obvious explanation along those lines.

I haven't yet been able to effectively distill the test case.

I'm doing this all at -O0.

With `-ddump-tc-trace`, I can see the offending cobox (cobox_a67M) is present 
in an "implication evbinds" listing after a "solveImplication end }" delimiter, 
but that's the last obvious binding of it.

 [G] cobox_a67J = CO Sym cobox_a654,
 [G] cobox_a67M
   = cobox_a67J `cast` U(plugin:coxswain,...)

cobox_a654 is introduced by a GADT pattern match.

I'm also not seeing obvious occurrences of cobox_a67M, but I think the reason 
is that I'm seeing several (Sym cobox) with no uniques printed (even with 
`-dppr-debug`). Those are probably the cobox in question, but I can't confirm.

Questions:

1) Is there a robust way to ensure that covar's uniques are always printed? (Is 
the pprIface reuse  with a free cobox part of the issue here?)

2) Is my plugin asking for this kind of trouble by using UnivCo to cast coboxes?

3) If I spent the effort to create non-UnivCo coercions where possible, would 
that likely help? This is currently an "eventually" task, but I haven't seen an 
urgency for it yet. I could bump its priority if it might help. E.G. I'm using 
UnivCo to cast entire givens when all I'm doing is reducing a type family 
application somewhere "deep" within the given's predtype. I could, with 
considerable effort, instead wrap a single, localized UnivCo within a bunch of 
non-UnivCo "lifting" coercion constructors. Would that likely help?

3) Is there a usual suspect for this kind of situation where a cobox binding is 
seemingly dropped (by the typechecker) even though there's an occurrence of it?

Thank you for your time. -Nick
___
ghc-devs mailing list
ghc-devs@haskell.org

RE: The Curious Case of T6084 -or- Register Confusion with LLVM

2017-09-21 Thread Simon Peyton Jones via ghc-devs
|  One way to make this happen would be for C-- call nodes to carry information
|  about the calling convention of the target (e.g. how many arguments of each
|  type the function expects; in the same way identifiers in Core carry their
|  type).

That's be entirely possible for "known" calls, where the target is known, but 
not for "unknown" (i.e higher order) ones where the target of the call varies.

The "Making a fast curry" paper goes into this in some detail.  I think we 
already have different entry points for these two cases.  So maybe they could 
have different entry conventions...

Simon

|  -Original Message-
|  From: ghc-devs [mailto:ghc-devs-boun...@haskell.org] On Behalf Of Ben Gamari
|  Sent: 20 September 2017 16:54
|  To: Moritz Angermann ; GHC developers 
|  Subject: Re: The Curious Case of T6084 -or- Register Confusion with LLVM
|  
|  Moritz Angermann  writes:
|  
|  [snip]
|  >
|  > I should not have the YMM*, and ZMM* registers as I don’t have any AVX
|  > nor AVX512; that looks like only a patch away.  However we try to
|  > optimize our register, such that we can pass up to six doubles or six
|  > floats or any combination of both if needed in registers, without
|  > having to allocate them on the stack, by assuming overlapping registers
|  (See Note [Overlapping global registers]).
|  >
|  > And as such a full function signature in LLVM would as opposed to one
|  > that’s based on the “live” registers as we have right now, would
|  > consist of 12 float/double registers, and LLVM only maps 6.  My
|  > current idea is to, pass only the explicit F1,D1,…,F3,D3 and try to
|  > disable the register overlapping for LLVM.  This would probably force
|  > more floating values to be stack allocated rather than passed via
|  > registers, but would likely guarantee that the registers match up.
|  > The other option I can think of is to define some viertual generic
|  > floating registers in the llvm code gen: V1,…,V6 and then perform
|  > something like
|  >
|  >   F1 <- V1 as float
|  >   D1 <- V1 as double
|  >
|  > in the body of the function, while trying to use the `live`
|  > information at the call site to decide which of F1 or D1 to pass as V1.
|  >
|  Arguably the fundamental problem here is the assumption that all STG entry-
|  points have the same machine-level calling convention. As you point out, our
|  calling conventions in fact change due to things like register overlap.
|  Ideally the LLVM we produce would reflect this.
|  
|  One way to make this happen would be for C-- call nodes to carry information
|  about the calling convention of the target (e.g. how many arguments of each
|  type the function expects; in the same way identifiers in Core carry their
|  type). Unfortunately a brief look at the code generator suggests that this
|  may require a fair amount of plumbing.
|  
|  It's important to note though that this overlap problem is something that
|  will need to be addressed eventually if we are are to have proper SIMD
|  support (due to overlap between XMM, YMM, and ZMM).
|  
|  Cheers,
|  
|  - Ben
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


RE: The Curious Case of T6084 -or- Register Confusion with LLVM

2017-09-21 Thread Simon Peyton Jones via ghc-devs
Moritz

Talk to Kavon.  He was thinking about passing a struct instead of a huge list 
of registers, and only initialising the live fields of the struct.  I don't 
know whether he ultimately discarded the idea, but it sounded promising.

Simon

|  -Original Message-
|  From: ghc-devs [mailto:ghc-devs-boun...@haskell.org] On Behalf Of Moritz
|  Angermann
|  Sent: 20 September 2017 10:45
|  To: GHC developers 
|  Subject: The Curious Case of T6084 -or- Register Confusion with LLVM
|  
|  Hi *,
|  
|  TLDR: The LLVM backend might confuse floating registers in GHC.
|  
|  # Demo (Ticket #14251)
|  
|  Let Demo.hs be the following short program (a minor modification from
|  T6084):
|  ```
|  {-# LANGUAGE MagicHash, BangPatterns #-} module Main where
|  
|  import GHC.Exts
|  
|  {-# NOINLINE f #-}
|  f :: (Int# -> Float# -> Double# -> Float# -> Double# -> String) -> String f
|  g = g 3# 4.0# 5.0## 6.0# 6.9## ++ " World!"
|  
|  {-# NOINLINE q #-}
|  q :: Int# -> Float# -> Double# -> Float# -> Double# -> String q i j k l m =
|  "Hello " ++ show (F# l) ++ " " ++ show (D# m)
|  
|  main = putStrLn (f $ q)
|  ```
|  
|  What happens if we compile them with the NCG and LLVM?
|  
|  $ ghc -fasm -fforce-recomp Demo.hs -O2 -o Demo-ncg && ./Demo-ncg Hello 6.0
|  6.9 World!
|  
|  $ ghc -fllvm -fforce-recomp Demo.hs -O2 -o Demo-llvm && ./Demo-llvm Hello
|  4.0 5.0 World!
|  
|  # Discussion
|  
|  What is happening here?  The LLVM backend passes the registers in arguments,
|  which are then mapped to registers via the GHC calling convention we added
|  to LLVM.
|  
|  As the LLVM backend takes off from Cmm, we produce function that always hold
|  the always live registers (on x86_64 these are: Base, Sp, Hp, R1, R2, R3,
|  R4, R5, R6, SpLim) and appends those registers that are live throughout the
|  function call: in the case of `q` this is one Float and one Double register.
|  Let’s assume these are
|  F3 and D4.  Thus the function signature we generate looks like:
|  
|  ghccc void @q(i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float,
|  double)
|  
|  And expect the passed arguments to represent the following registers:
|  
|base, sp, hp, r1, r2, r3, r4, r5, r6, spLim, f3, d4
|  
|  as we found that f1 and d1 are not live.
|  
|  Yet, when we call `q` in the form of `g` in the body of `f`. We will pass it
|  14 arguments instead of 12.  To make this “typecheck” in LLVM, we
|  
|  @q' = bitcast @q to (i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64,
|  float, double, float, double)
|  
|  and call @q’(base, sp, hp, r1, r2, r3, r4, r5, r6, spLim, f1, d2, f3, d4).
|  
|  at this point, we now assign f3 <- f1 and d4 <- d2; while silently ignoring
|  the passed arguments f3 and d4.
|  
|  (This is where my llvmng backend fell over, as it does not bitcast function
|  signatures but tries to unify them.)
|  
|  # Solution?
|  
|  Initially, Ben and I though we could simply always pass all registers as
|  arguments in LLVM and call it a day with the downside of create more verbose
|  but correct code.  As I found out, that comes with a few complications.  For
|  some reason, all active stg registers for my machine give me
|  
|Base, Sp, Hp, R1, R2, R3, R4, R5, R6, SpLim,
|F1, D1, F2, D2, F3, D3,
|XMM1,XMM2,XMM3,XMM4,XMM5,XMM6,
|YMM1,YMM2,YMM3,YMM4,YMM5,YMM6,
|ZMM1,ZMM2,ZMM3,ZMM4,ZMM5,ZMM6
|  
|  I should not have the YMM*, and ZMM* registers as I don’t have any AVX nor
|  AVX512; that looks like only a patch away.  However we try to optimize our
|  register, such that we can pass up to six doubles or six floats or any
|  combination of both if needed in registers, without having to allocate them
|  on the stack, by assuming overlapping registers (See Note [Overlapping
|  global registers]).
|  
|  And as such a full function signature in LLVM would as opposed to one that’s
|  based on the “live” registers as we have right now, would consist of 12
|  float/double registers, and LLVM only maps 6.  My current idea is to, pass
|  only the explicit F1,D1,…,F3,D3 and try to disable the register overlapping
|  for LLVM.  This would probably force more floating values to be stack
|  allocated rather than passed via registers, but would likely guarantee that
|  the registers match up.  The other option I can think of is to define some
|  viertual generic floating registers in the llvm code gen: V1,…,V6 and then
|  perform something like
|  
|F1 <- V1 as float
|D1 <- V1 as double
|  
|  in the body of the function, while trying to use the `live` information at
|  the call site to decide which of F1 or D1 to pass as V1.
|  
|  Ideas?
|  
|  Cheers,
|   Moritz
|  
|  ___
|  ghc-devs mailing list
|  ghc-devs@haskell.org
|  https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmail.haskell
|  .org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fghc-
|