Indeed STG to Cmm lowering drops the correct size information for
ccall arguments, there's even a TODO comment that has been around for
quite a few years:
https://gitlab.haskell.org/ghc/ghc/-/blob/master/compiler/GHC/StgToCmm/Foreign.hs#L83

This has been an annoyance for Asterius as well. When we try to
translate a CmmUnsafeForeignCall node to a wasm function call, a CInt
argument (which should be i32 in wasm) can be mistyped as i64 which
causes a validation error. We have to insert wrap/extend opcodes based
on the callee function signature, but if we preserve correct argument
size in Cmm (or at least enrich the hints to include it), we won't
need such a hack.

On Tue, Oct 20, 2020 at 4:05 PM Moritz Angermann
<moritz.angerm...@gmail.com> wrote:
>
> Yes, that's right. I'm not sure it's in core though, as the width information 
> still seems to be available in Stg. However the lowering from
> stg into cmm widens it.
>
> On Tue, Oct 20, 2020 at 9:57 PM Carter Schonwald <carter.schonw...@gmail.com> 
> wrote:
>>
>> ... are you talking about Haskell Int and word? Those are always the same 
>> size in bits and should match native point size. That is definitely an 
>> assumption of ghc
>>
>> It sounds like some information that is dropped after core is needed to 
>> correctly do something in stg/cmm in the context of the ARM64 ncg that was 
>> recently added to handle cint being 32bit in this context ?
>>
>>
>> On Tue, Oct 20, 2020 at 5:49 AM Moritz Angermann 
>> <moritz.angerm...@gmail.com> wrote:
>>>
>>> Alright, let me expand a bit.  I've been looking at aarch64 NCG for ghc.  
>>> The Linux side of things is looking really good,
>>> so I've moved onto the macOS side (I'm afraid I don't have any Windows 
>>> aarch64 hardware, nor much windows knowledge
>>> to even attempt a Windows version yet).
>>>
>>> When calling C functions, the usual approach is to pass the first few 
>>> arguments in registers, and then arguments that exceed
>>> the argument passing slots on the stack.  The Arm AArch64 Procedure Call 
>>> Standard (aapcs) for C does this by assigning 8byte
>>> slots to each overflow argument on the stack.  A company I won't name, has 
>>> decided to implement a slightly different variation of
>>> the Procedure Call Standard, often referred to as darwinpcs.  This deviates 
>>> from the aapcs for vargs, as well as for handling of
>>> spilled arguments on the stack.
>>>
>>> The aapcs allows us to generate calls to C functions without knowing the 
>>> actual prototype of the function, as all arguments are
>>> simply spilled into 8byte slots on the stack.  The darwinpcs however 
>>> requires us to know the size of the arguments, so we can
>>> properly pack them onto the stack.  Ints have 4 bytes, so we need to pack 
>>> them into 4byte slots.
>>>
>>> In the process library we have this rather fun foreign import:
>>> foreign import ccall unsafe "runInteractiveProcess"
>>>   c_runInteractiveProcess
>>>         ::  Ptr CString
>>>         -> CString
>>>         -> Ptr CString
>>>         -> FD
>>>         -> FD
>>>         -> FD
>>>         -> Ptr FD
>>>         -> Ptr FD
>>>         -> Ptr FD
>>>         -> Ptr CGid
>>>         -> Ptr CUid
>>>         -> CInt                         -- reset child's SIGINT & SIGQUIT 
>>> handlers
>>>         -> CInt                         -- flags
>>>         -> Ptr CString
>>>         -> IO PHANDLE
>>>
>>> with the corresponding C declaration:
>>>
>>> extern ProcHandle runInteractiveProcess( char *const args[],
>>>                                          char *workingDirectory,
>>>                                          char **environment,
>>>                                          int fdStdIn,
>>>                                          int fdStdOut,
>>>                                          int fdStdErr,
>>>                                          int *pfdStdInput,
>>>                                          int *pfdStdOutput,
>>>                                          int *pfdStdError,
>>>                                          gid_t *childGroup,
>>>                                          uid_t *childUser,
>>>                                          int reset_int_quit_handlers,
>>>                                          int flags,
>>>                                          char **failed_doing);
>>> This function thus takes 14 arguments. We pass only the first 8 arguments 
>>> in registers, and the others on the stack.
>>> Argument 12 and 13 are of type int.  On linux using the aapcs, we can pass 
>>> those in 8byte slots on the stack. That is
>>> both of them are effectively 64bits wide when passed.  However for 
>>> darwinpcs, it is expected that these adhere to their
>>> size and are packed as such. Therefore Argument 12 and 13 need to be passed 
>>> as 4byte slots each on the stack.
>>>
>>> This yields a moderate 8byte saving on the stack for the same function call 
>>> on darwinpcs compared to aapcs.
>>>
>>> Now onto GHC.  When we generate function calls for foreign C functions, we 
>>> deal with something like:
>>>
>>> genCCall
>>>     :: ForeignTarget      -- function to call
>>>     -> [CmmFormal]        -- where to put the result
>>>     -> [CmmActual]        -- arguments (of mixed type)
>>>     -> BlockId            -- The block we are in
>>>     -> NatM (InstrBlock, Maybe BlockId)
>>>
>>> based on Cmm Nodes of the form CmmUnsafeForeignCall target result_regs args
>>>
>>> The CmmActual in the care of runInteractiveProcess hold the arguments for 
>>> the function, however contrary to the function
>>> declaration, it contains I64 slots for Argument 12 and 13. Thus computing 
>>> the space needed for them based on their Cmm
>>> Representations yields 8bytes, when they should really be 32bit and consume 
>>> only 4 byte.
>>>
>>> To illustrate this a bit better: here is what we see in the pretty printed 
>>> cmm:
>>>
>>> (_s6w3::I64) = call "ccall" arg hints:  [PtrHint, PtrHint, PtrHint, signed, 
>>> signed, signed, PtrHint, PtrHint, PtrHint, PtrHint, PtrHint, signed, 
>>> signed, PtrHint]  result hints:  [signed] _runInteractiveProcess(I64[Sp + 
>>> 96], I64[Sp + 88], I64[Sp + 104], I64[Sp + 112], I64[Sp + 120], I64[Sp + 
>>> 56], I64[Sp + 64], I64[Sp + 72], I64[Sp + 24], 0, 0, I64[Sp + 8], I64[Sp + 
>>> 40] | I64[Sp + 48] | I64[Sp + 80] | 3, I64[R1 + 7]);
>>>
>>> I've added size information to the ForeignHints (NoHint, AddrHint, 
>>> SignedHint) we have, and computed both, which yields:
>>>
>>> [(CmmReg (CmmLocal (LocalReg s6Gi (CmmType BitsCat W64))),AddrHint)
>>> ,(CmmReg (CmmLocal (LocalReg s6Gk (CmmType BitsCat W64))),AddrHint)
>>> ,(CmmReg (CmmLocal (LocalReg s6Gm (CmmType BitsCat W64))),AddrHint)
>>> ,(CmmReg (CmmLocal (LocalReg s6Go (CmmType BitsCat W64))),SignedHint W32)
>>> ,(CmmReg (CmmLocal (LocalReg s6Gq (CmmType BitsCat W64))),SignedHint W32)
>>> ,(CmmReg (CmmLocal (LocalReg s6Gs (CmmType BitsCat W64))),SignedHint W32)
>>> ,(CmmReg (CmmLocal (LocalReg s6Gu (CmmType BitsCat W64))),AddrHint)
>>> ,(CmmReg (CmmLocal (LocalReg s6Gw (CmmType BitsCat W64))),AddrHint)
>>> ,(CmmReg (CmmLocal (LocalReg s6Gy (CmmType BitsCat W64))),AddrHint)
>>> ,(CmmReg (CmmLocal (LocalReg s6Cp (CmmType BitsCat W64))),AddrHint)
>>> ,(CmmReg (CmmLocal (LocalReg s6FU (CmmType BitsCat W64))),AddrHint)
>>> ,(CmmReg (CmmLocal (LocalReg s6GA (CmmType BitsCat W64))),SignedHint W32)
>>> ,(CmmReg (CmmLocal (LocalReg s6GR (CmmType BitsCat W64))),SignedHint W32)
>>> ,(CmmReg (CmmLocal (LocalReg s6GM (CmmType BitsCat W64))),AddrHint)]
>>>
>>> Thus, while we *do* know the right size from STG (which is what the Hints 
>>> are computed from), we loose this information when lowering
>>> into Cmm, where we represent them with W64. This is what I was alluding to 
>>> in the previous email. In primRepCmmType, and mkIntCLit, we set their type 
>>> to 64bit for Ints; which on this platform does not hold.
>>>
>>> Now I've gone ahead and effectively assume Cmm is lying to me when 
>>> generating Foreign Function Calls, and rely on the (new) sized
>>> hints to produce the appropriate argument packing on the stack.  However I 
>>> believe the correct way would be for GHC not to conflate Ints
>>> and Words in Cmm; implicitly assuming they are the same width.  Sadly it's 
>>> not as simple as having primRepCmmType and mkIntCLit produce 32bit types. I 
>>> fear GHC internally assumes "Int" means 64bit Integer, and then just 
>>> happens to make the Int ~ CInt assumption.
>>>
>>> Cheers,
>>>  Moritz
>>>
>>> On Tue, Oct 20, 2020 at 3:33 PM Simon Peyton Jones <simo...@microsoft.com> 
>>> wrote:
>>>>
>>>> Moritz
>>>>
>>>>
>>>>
>>>> I’m afraid I don’t understand any of this.  Not your fault, but  I just 
>>>> don’t have enough context to know what you mean.
>>>>
>>>>
>>>>
>>>> Is there a current bug?  If so, can you demonstrate it?   If not, what is 
>>>> the problem you want to solve?  Examples are always helpful.
>>>>
>>>>
>>>>
>>>> Maybe it’s worth opening a ticket too?
>>>>
>>>>
>>>>
>>>> Thanks!
>>>>
>>>>
>>>>
>>>> Simon
>>>>
>>>>
>>>>
>>>> From: ghc-devs <ghc-devs-boun...@haskell.org> On Behalf Of Moritz Angermann
>>>> Sent: 20 October 2020 02:51
>>>> To: ghc-devs <ghc-devs@haskell.org>
>>>> Subject: GHC's internal confusion about Ints and Words
>>>>
>>>>
>>>>
>>>> Hi there!
>>>>
>>>>
>>>>
>>>> So there is a procedure calling convention that for reasons I did not 
>>>> fully understand, but seem to be historically grown, uses packed arguments 
>>>> for those that are spilled onto the stack. On top of that, CInt is 32bit, 
>>>> Word is 64bits. This provides the following spectacle:
>>>>
>>>>
>>>>
>>>> While we know in STG that the CInt is 32bits wide, when lowered into Cmm, 
>>>> it's represented as I64 in the arguments to the C function.  Thus packing 
>>>> based on the format of the Cmm type would yield 8 bytes. And now, all 
>>>> further packed arguments have the wrong offset (by four).
>>>>
>>>>
>>>>
>>>> Specifically in GHC.Cmm.Utils we find:
>>>>
>>>> primRepCmmType :: Platform -> PrimRep -> CmmType
>>>>
>>>> primRepCmmType platform IntRep = bWord platform
>>>>
>>>>
>>>>
>>>> mkIntCLit :: Platform -> Int -> CmmLit
>>>> mkIntCLit platform i = CmmInt (toInteger i) (wordWidth platform)
>>>>
>>>>
>>>>
>>>> The naive idea to just fix this and make them return cIntWidth instead, 
>>>> seemingly produces the correct Cmm expressions at a local level, but 
>>>> produces a broken compiler.
>>>>
>>>>
>>>>
>>>> A second approach could be to extend the Hints into providing sizes, and 
>>>> using those during the foreign call generation to pack spilled arguments.  
>>>> This however appears to be more of a patching up of some fundamental 
>>>> underlying issue, instead of rectifying it properly.
>>>>
>>>>
>>>>
>>>> Maybe I'll have to go down the Hint path, it does however break current Eq 
>>>> assumptions, as they are sized now, and what was equal before, is only 
>>>> equal now if they represent the same size.
>>>>
>>>>
>>>>
>>>> From a cursory glance at the issues with naively fixing the width for Int, 
>>>> it seems that GHC internally assumes sizeof(Int) = sizeof(Word).  Maybe 
>>>> there is a whole level of HsInt vs CInt discrimination missing?
>>>>
>>>>
>>>>
>>>> Cheers,
>>>>
>>>>  Moritz
>>>
>>> _______________________________________________
>>> ghc-devs mailing list
>>> ghc-devs@haskell.org
>>> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
>
> _______________________________________________
> ghc-devs mailing list
> ghc-devs@haskell.org
> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
_______________________________________________
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Reply via email to