Re: FFI-free NaN checks? (isDoubleNan and friends)

2018-03-06 Thread Sylvain Henry

Hi,

You can try with foreign primops, it should be faster than the FFI:

In IsDoubleNanPrim.s:

.global isDoubleNan_prim
isDoubleNan_prim:
   xor %rbx,%rbx
   ucomisd %xmm1, %xmm1
   lahf
   testb $68, %ah
   jnp .Lout
   mov $1, %rbx
.Lout:
   jmp * (%rbp)


In IsDoubleNan.hs:

{-# LANGUAGE GHCForeignImportPrim #-}
{-# LANGUAGE MagicHash #-}
{-# LANGUAGE UnliftedFFITypes #-}

module Main where

import GHC.Base

foreign import prim "isDoubleNan_prim" isDoubleNan_prim :: Double# -> Int#

isDoubleNan :: Double -> Bool
isDoubleNan (D# d#) = case isDoubleNan_prim d# of
   0# -> False
   _  -> True

main :: IO ()
main = do
   let testNaN x = putStrLn $ "Testing " ++ show x ++ ": " ++ show 
(isDoubleNan x)

   testNaN 10.3
   testNaN (0/0)

Compile with: ghc -Wall -O IsDoubleNan.hs IsDoubleNanPrim.s

I haven't benchmarked this but I would be interested to see the 
comparison with the other versions on your benchmarks!


Cheers,
Sylvain


On 05/03/2018 22:53, Mateusz Kowalczyk wrote:

Hi,

Recently at a client I was profiling some code and isDoubleNaN lit up.
We were checking a lot of doubles for NaN as that's what customer would
send in.

I went to investigate and I found that FFI is used to achieve this. I
was always under the impression that FFI costs a little. I had at the
time replaced the code with a hack with great results:

```
isNaN' :: Double -> Bool
isNaN' d = d /= d
```

While this worked and provided good speedup in my case, this fails
catastrophically if the program is compiled with -ffast-math. This is
expected. I have since reverted it. Seeking an alternative solution I
have thought about re-implementing the C code with a native Haskell
version: after all it just checks a few bits. Apparently unsafeCoerce#
and friends were a big no-no but I found
https://phabricator.haskell.org/D3358 . I have implemented the code at
the bottom of this post. Obviously it's missing endianness (compile-time
switch).

This seems to be faster for smaller `mkInput` list than Prelude.isNaN
but slower slightly on the one below. The `/=` version is the fastest
but very fragile.

My question to you all is whether implementing a version of this
function in Haskell makes sense and if not, why not? The
stgDoubleToWord64 is implemented in CMM and I don't know anything about
the costs of that.

* Is there a cheaper alternative to FFI way?
* If yes, does anyone know how to write it such that it compiles to same
code but without the call overhead? I must have failed below as it's
slower on some inputs.

Basically if a faster way exists for isNaN, something I have to do a
lot, I'd love to hear about it.

I leave you with basic code I managed to come up with. 8.4.x only.


```
{-# LANGUAGE MagicHash#-}
{-# OPTIONS_GHC -O2 -ddump-simpl -ddump-stg -ddump-to-file -ddump-asm #-}
module Main (main) where

import GHC.Float
import GHC.Prim

isNaN' :: Double -> Bool
isNaN' d = d /= d

isNaNBits :: Double -> Bool
isNaNBits (D# d) = case (bits `and#` expMask) `eqWord#` expMask of
   1# -> case bits `and#` mantissaMask of
 0## -> False
 _ -> True
   _ -> False
   where
 bits :: Word#
 bits = stgDoubleToWord64 d

 expMask, mantissaMask :: Word#
 expMask = 0x7FF0##
 mantissaMask = 0x000F##

main :: IO ()
main = sumFilter isNaN {-isNaN'-} {-isNaNBits-} (mkInput 1)
`seq` pure ()
   where
 nan :: Double
 nan = log (-1)

 mkInput :: Int -> [Double]
 mkInput n = take n $ cycle [1, nan]

 sumFilter :: (Double -> Bool) -> [Double] -> Double
 sumFilter p = Prelude.sum . Prelude.filter (not . p)
```



___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: FFI-free NaN checks? (isDoubleNan and friends)

2018-03-06 Thread Mateusz Kowalczyk
On 03/06/2018 10:43 AM, Brandon Allbery wrote:
> I'd in general expect good C code to optimize a little better; and in
> particular, decomposing an IEEE float is almost certainly more expensive in
> Haskell than in C, because unions let you cheat. (And I recall looking at
> the implementation of decodeFloat once; it's significantly longer than that
> C.) But I have to wonder if that code would be better done with something
> more native; the implementation may be a portable default, and you might be
> able to find something x86-specific that is faster.

There's a https://c9x.me/x86/html/file_module_x86_id_316.html that the
‘d /= d’ way compiles to. I suppose maybe I could just keep using that
and fall back onto isDoubleNaN if __FAST_MATH__ is set…

> On Tue, Mar 6, 2018 at 5:35 AM, Mateusz Kowalczyk 
> wrote:
> 
>> On 03/05/2018 10:23 PM, Brandon Allbery wrote:
>>> If the FFI version is done with "safe", consider using "unsafe" instead.
>>> There are technical reasons why this is slightly incorrect, but unless
>>> you're fiddling with the CPU's FP control flags they're mostly irrelevant
>>> and you can treat isNaN as pure and non-side-effectful, significantly
>>> reducing the overhead. You may also be able to use "ccall" to take
>>> advantage of C compiler level optimizations, or simply to directly
>> invoke a
>>> CPU-based test with asm(); but you'll need to hide that in a C
>> preprocessor
>>> #define, so that it looks syntactically like a function call to the FFI.
>>>
>>> (One of the technical reasons is that various OSes have been known to
>>> introduce bugs in their FP register and state handling across system
>> calls,
>>> in which case the "safe" version may turn "complete FP chaos" into merely
>>> "wrong answer". It's your call whether, or which side, of this bothers
>> you.)
>>
>> Perhaps I was a little unclear. The FFI-using isDoubleNaN is something
>> GHC does!
>>
>> ```
>> libraries/base/GHC/Float.hs:foreign import ccall unsafe "isDoubleNaN"
>> isDoubleNaN :: Double -> Int
>> ```
>> ```
>> HsInt
>> isDoubleNaN(HsDouble d)
>> {
>>   union stg_ieee754_dbl u;
>>
>>   u.d = d;
>>
>>   return (
>> u.ieee.exponent  == 2047 /* 2^11 - 1 */ &&  /* Is the exponent all
>> ones? */
>> (u.ieee.mantissa0 != 0 || u.ieee.mantissa1 != 0)
>> /* and the mantissa non-zero? */
>> );
>> }
>> ```
>>
>> My question is whether it could do better by not doing FFI and instead
>> computing natively and if not, why not?
>>
>>> On Mon, Mar 5, 2018 at 4:53 PM, Mateusz Kowalczyk <
>> fuuze...@fuuzetsu.co.uk>
>>> wrote:
>>>
 Hi,

 Recently at a client I was profiling some code and isDoubleNaN lit up.
 We were checking a lot of doubles for NaN as that's what customer would
 send in.

 I went to investigate and I found that FFI is used to achieve this. I
 was always under the impression that FFI costs a little. I had at the
 time replaced the code with a hack with great results:

 ```
 isNaN' :: Double -> Bool
 isNaN' d = d /= d
 ```

 While this worked and provided good speedup in my case, this fails
 catastrophically if the program is compiled with -ffast-math. This is
 expected. I have since reverted it. Seeking an alternative solution I
 have thought about re-implementing the C code with a native Haskell
 version: after all it just checks a few bits. Apparently unsafeCoerce#
 and friends were a big no-no but I found
 https://phabricator.haskell.org/D3358 . I have implemented the code at
 the bottom of this post. Obviously it's missing endianness (compile-time
 switch).

 This seems to be faster for smaller `mkInput` list than Prelude.isNaN
 but slower slightly on the one below. The `/=` version is the fastest
 but very fragile.

 My question to you all is whether implementing a version of this
 function in Haskell makes sense and if not, why not? The
 stgDoubleToWord64 is implemented in CMM and I don't know anything about
 the costs of that.

 * Is there a cheaper alternative to FFI way?
 * If yes, does anyone know how to write it such that it compiles to same
 code but without the call overhead? I must have failed below as it's
 slower on some inputs.

 Basically if a faster way exists for isNaN, something I have to do a
 lot, I'd love to hear about it.

 I leave you with basic code I managed to come up with. 8.4.x only.


 ```
 {-# LANGUAGE MagicHash#-}
 {-# OPTIONS_GHC -O2 -ddump-simpl -ddump-stg -ddump-to-file -ddump-asm
>> #-}
 module Main (main) where

 import GHC.Float
 import GHC.Prim

 isNaN' :: Double -> Bool
 isNaN' d = d /= d

 isNaNBits :: Double -> Bool
 isNaNBits (D# d) = case (bits `and#` expMask) `eqWord#` expMask of
   1# -> case bits `and#` mantissaMask of
 0## -> False
 _ -> True
   _ -> False
   

Re: FFI-free NaN checks? (isDoubleNan and friends)

2018-03-06 Thread Brandon Allbery
I'd in general expect good C code to optimize a little better; and in
particular, decomposing an IEEE float is almost certainly more expensive in
Haskell than in C, because unions let you cheat. (And I recall looking at
the implementation of decodeFloat once; it's significantly longer than that
C.) But I have to wonder if that code would be better done with something
more native; the implementation may be a portable default, and you might be
able to find something x86-specific that is faster.

On Tue, Mar 6, 2018 at 5:35 AM, Mateusz Kowalczyk 
wrote:

> On 03/05/2018 10:23 PM, Brandon Allbery wrote:
> > If the FFI version is done with "safe", consider using "unsafe" instead.
> > There are technical reasons why this is slightly incorrect, but unless
> > you're fiddling with the CPU's FP control flags they're mostly irrelevant
> > and you can treat isNaN as pure and non-side-effectful, significantly
> > reducing the overhead. You may also be able to use "ccall" to take
> > advantage of C compiler level optimizations, or simply to directly
> invoke a
> > CPU-based test with asm(); but you'll need to hide that in a C
> preprocessor
> > #define, so that it looks syntactically like a function call to the FFI.
> >
> > (One of the technical reasons is that various OSes have been known to
> > introduce bugs in their FP register and state handling across system
> calls,
> > in which case the "safe" version may turn "complete FP chaos" into merely
> > "wrong answer". It's your call whether, or which side, of this bothers
> you.)
>
> Perhaps I was a little unclear. The FFI-using isDoubleNaN is something
> GHC does!
>
> ```
> libraries/base/GHC/Float.hs:foreign import ccall unsafe "isDoubleNaN"
> isDoubleNaN :: Double -> Int
> ```
> ```
> HsInt
> isDoubleNaN(HsDouble d)
> {
>   union stg_ieee754_dbl u;
>
>   u.d = d;
>
>   return (
> u.ieee.exponent  == 2047 /* 2^11 - 1 */ &&  /* Is the exponent all
> ones? */
> (u.ieee.mantissa0 != 0 || u.ieee.mantissa1 != 0)
> /* and the mantissa non-zero? */
> );
> }
> ```
>
> My question is whether it could do better by not doing FFI and instead
> computing natively and if not, why not?
>
> > On Mon, Mar 5, 2018 at 4:53 PM, Mateusz Kowalczyk <
> fuuze...@fuuzetsu.co.uk>
> > wrote:
> >
> >> Hi,
> >>
> >> Recently at a client I was profiling some code and isDoubleNaN lit up.
> >> We were checking a lot of doubles for NaN as that's what customer would
> >> send in.
> >>
> >> I went to investigate and I found that FFI is used to achieve this. I
> >> was always under the impression that FFI costs a little. I had at the
> >> time replaced the code with a hack with great results:
> >>
> >> ```
> >> isNaN' :: Double -> Bool
> >> isNaN' d = d /= d
> >> ```
> >>
> >> While this worked and provided good speedup in my case, this fails
> >> catastrophically if the program is compiled with -ffast-math. This is
> >> expected. I have since reverted it. Seeking an alternative solution I
> >> have thought about re-implementing the C code with a native Haskell
> >> version: after all it just checks a few bits. Apparently unsafeCoerce#
> >> and friends were a big no-no but I found
> >> https://phabricator.haskell.org/D3358 . I have implemented the code at
> >> the bottom of this post. Obviously it's missing endianness (compile-time
> >> switch).
> >>
> >> This seems to be faster for smaller `mkInput` list than Prelude.isNaN
> >> but slower slightly on the one below. The `/=` version is the fastest
> >> but very fragile.
> >>
> >> My question to you all is whether implementing a version of this
> >> function in Haskell makes sense and if not, why not? The
> >> stgDoubleToWord64 is implemented in CMM and I don't know anything about
> >> the costs of that.
> >>
> >> * Is there a cheaper alternative to FFI way?
> >> * If yes, does anyone know how to write it such that it compiles to same
> >> code but without the call overhead? I must have failed below as it's
> >> slower on some inputs.
> >>
> >> Basically if a faster way exists for isNaN, something I have to do a
> >> lot, I'd love to hear about it.
> >>
> >> I leave you with basic code I managed to come up with. 8.4.x only.
> >>
> >>
> >> ```
> >> {-# LANGUAGE MagicHash#-}
> >> {-# OPTIONS_GHC -O2 -ddump-simpl -ddump-stg -ddump-to-file -ddump-asm
> #-}
> >> module Main (main) where
> >>
> >> import GHC.Float
> >> import GHC.Prim
> >>
> >> isNaN' :: Double -> Bool
> >> isNaN' d = d /= d
> >>
> >> isNaNBits :: Double -> Bool
> >> isNaNBits (D# d) = case (bits `and#` expMask) `eqWord#` expMask of
> >>   1# -> case bits `and#` mantissaMask of
> >> 0## -> False
> >> _ -> True
> >>   _ -> False
> >>   where
> >> bits :: Word#
> >> bits = stgDoubleToWord64 d
> >>
> >> expMask, mantissaMask :: Word#
> >> expMask = 0x7FF0##
> >> mantissaMask = 0x000F##
> >>
> >> main :: IO ()
> >> main = sumFilter isNaN {-isNaN'-} {-isNaNBits-} (mkInput 1)
> >> `seq` 

Re: FFI-free NaN checks? (isDoubleNan and friends)

2018-03-06 Thread Mateusz Kowalczyk
On 03/05/2018 10:23 PM, Brandon Allbery wrote:
> If the FFI version is done with "safe", consider using "unsafe" instead.
> There are technical reasons why this is slightly incorrect, but unless
> you're fiddling with the CPU's FP control flags they're mostly irrelevant
> and you can treat isNaN as pure and non-side-effectful, significantly
> reducing the overhead. You may also be able to use "ccall" to take
> advantage of C compiler level optimizations, or simply to directly invoke a
> CPU-based test with asm(); but you'll need to hide that in a C preprocessor
> #define, so that it looks syntactically like a function call to the FFI.
> 
> (One of the technical reasons is that various OSes have been known to
> introduce bugs in their FP register and state handling across system calls,
> in which case the "safe" version may turn "complete FP chaos" into merely
> "wrong answer". It's your call whether, or which side, of this bothers you.)

Perhaps I was a little unclear. The FFI-using isDoubleNaN is something
GHC does!

```
libraries/base/GHC/Float.hs:foreign import ccall unsafe "isDoubleNaN"
isDoubleNaN :: Double -> Int
```
```
HsInt
isDoubleNaN(HsDouble d)
{
  union stg_ieee754_dbl u;

  u.d = d;

  return (
u.ieee.exponent  == 2047 /* 2^11 - 1 */ &&  /* Is the exponent all
ones? */
(u.ieee.mantissa0 != 0 || u.ieee.mantissa1 != 0)
/* and the mantissa non-zero? */
);
}
```

My question is whether it could do better by not doing FFI and instead
computing natively and if not, why not?

> On Mon, Mar 5, 2018 at 4:53 PM, Mateusz Kowalczyk 
> wrote:
> 
>> Hi,
>>
>> Recently at a client I was profiling some code and isDoubleNaN lit up.
>> We were checking a lot of doubles for NaN as that's what customer would
>> send in.
>>
>> I went to investigate and I found that FFI is used to achieve this. I
>> was always under the impression that FFI costs a little. I had at the
>> time replaced the code with a hack with great results:
>>
>> ```
>> isNaN' :: Double -> Bool
>> isNaN' d = d /= d
>> ```
>>
>> While this worked and provided good speedup in my case, this fails
>> catastrophically if the program is compiled with -ffast-math. This is
>> expected. I have since reverted it. Seeking an alternative solution I
>> have thought about re-implementing the C code with a native Haskell
>> version: after all it just checks a few bits. Apparently unsafeCoerce#
>> and friends were a big no-no but I found
>> https://phabricator.haskell.org/D3358 . I have implemented the code at
>> the bottom of this post. Obviously it's missing endianness (compile-time
>> switch).
>>
>> This seems to be faster for smaller `mkInput` list than Prelude.isNaN
>> but slower slightly on the one below. The `/=` version is the fastest
>> but very fragile.
>>
>> My question to you all is whether implementing a version of this
>> function in Haskell makes sense and if not, why not? The
>> stgDoubleToWord64 is implemented in CMM and I don't know anything about
>> the costs of that.
>>
>> * Is there a cheaper alternative to FFI way?
>> * If yes, does anyone know how to write it such that it compiles to same
>> code but without the call overhead? I must have failed below as it's
>> slower on some inputs.
>>
>> Basically if a faster way exists for isNaN, something I have to do a
>> lot, I'd love to hear about it.
>>
>> I leave you with basic code I managed to come up with. 8.4.x only.
>>
>>
>> ```
>> {-# LANGUAGE MagicHash#-}
>> {-# OPTIONS_GHC -O2 -ddump-simpl -ddump-stg -ddump-to-file -ddump-asm #-}
>> module Main (main) where
>>
>> import GHC.Float
>> import GHC.Prim
>>
>> isNaN' :: Double -> Bool
>> isNaN' d = d /= d
>>
>> isNaNBits :: Double -> Bool
>> isNaNBits (D# d) = case (bits `and#` expMask) `eqWord#` expMask of
>>   1# -> case bits `and#` mantissaMask of
>> 0## -> False
>> _ -> True
>>   _ -> False
>>   where
>> bits :: Word#
>> bits = stgDoubleToWord64 d
>>
>> expMask, mantissaMask :: Word#
>> expMask = 0x7FF0##
>> mantissaMask = 0x000F##
>>
>> main :: IO ()
>> main = sumFilter isNaN {-isNaN'-} {-isNaNBits-} (mkInput 1)
>> `seq` pure ()
>>   where
>> nan :: Double
>> nan = log (-1)
>>
>> mkInput :: Int -> [Double]
>> mkInput n = take n $ cycle [1, nan]
>>
>> sumFilter :: (Double -> Bool) -> [Double] -> Double
>> sumFilter p = Prelude.sum . Prelude.filter (not . p)
>> ```
>>
>> --
>> Mateusz K.
>> ___
>> ghc-devs mailing list
>> ghc-devs@haskell.org
>> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
>>
> 
> 
> 


-- 
Mateusz K.
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: FFI-free NaN checks? (isDoubleNan and friends)

2018-03-05 Thread Brandon Allbery
If the FFI version is done with "safe", consider using "unsafe" instead.
There are technical reasons why this is slightly incorrect, but unless
you're fiddling with the CPU's FP control flags they're mostly irrelevant
and you can treat isNaN as pure and non-side-effectful, significantly
reducing the overhead. You may also be able to use "ccall" to take
advantage of C compiler level optimizations, or simply to directly invoke a
CPU-based test with asm(); but you'll need to hide that in a C preprocessor
#define, so that it looks syntactically like a function call to the FFI.

(One of the technical reasons is that various OSes have been known to
introduce bugs in their FP register and state handling across system calls,
in which case the "safe" version may turn "complete FP chaos" into merely
"wrong answer". It's your call whether, or which side, of this bothers you.)

On Mon, Mar 5, 2018 at 4:53 PM, Mateusz Kowalczyk 
wrote:

> Hi,
>
> Recently at a client I was profiling some code and isDoubleNaN lit up.
> We were checking a lot of doubles for NaN as that's what customer would
> send in.
>
> I went to investigate and I found that FFI is used to achieve this. I
> was always under the impression that FFI costs a little. I had at the
> time replaced the code with a hack with great results:
>
> ```
> isNaN' :: Double -> Bool
> isNaN' d = d /= d
> ```
>
> While this worked and provided good speedup in my case, this fails
> catastrophically if the program is compiled with -ffast-math. This is
> expected. I have since reverted it. Seeking an alternative solution I
> have thought about re-implementing the C code with a native Haskell
> version: after all it just checks a few bits. Apparently unsafeCoerce#
> and friends were a big no-no but I found
> https://phabricator.haskell.org/D3358 . I have implemented the code at
> the bottom of this post. Obviously it's missing endianness (compile-time
> switch).
>
> This seems to be faster for smaller `mkInput` list than Prelude.isNaN
> but slower slightly on the one below. The `/=` version is the fastest
> but very fragile.
>
> My question to you all is whether implementing a version of this
> function in Haskell makes sense and if not, why not? The
> stgDoubleToWord64 is implemented in CMM and I don't know anything about
> the costs of that.
>
> * Is there a cheaper alternative to FFI way?
> * If yes, does anyone know how to write it such that it compiles to same
> code but without the call overhead? I must have failed below as it's
> slower on some inputs.
>
> Basically if a faster way exists for isNaN, something I have to do a
> lot, I'd love to hear about it.
>
> I leave you with basic code I managed to come up with. 8.4.x only.
>
>
> ```
> {-# LANGUAGE MagicHash#-}
> {-# OPTIONS_GHC -O2 -ddump-simpl -ddump-stg -ddump-to-file -ddump-asm #-}
> module Main (main) where
>
> import GHC.Float
> import GHC.Prim
>
> isNaN' :: Double -> Bool
> isNaN' d = d /= d
>
> isNaNBits :: Double -> Bool
> isNaNBits (D# d) = case (bits `and#` expMask) `eqWord#` expMask of
>   1# -> case bits `and#` mantissaMask of
> 0## -> False
> _ -> True
>   _ -> False
>   where
> bits :: Word#
> bits = stgDoubleToWord64 d
>
> expMask, mantissaMask :: Word#
> expMask = 0x7FF0##
> mantissaMask = 0x000F##
>
> main :: IO ()
> main = sumFilter isNaN {-isNaN'-} {-isNaNBits-} (mkInput 1)
> `seq` pure ()
>   where
> nan :: Double
> nan = log (-1)
>
> mkInput :: Int -> [Double]
> mkInput n = take n $ cycle [1, nan]
>
> sumFilter :: (Double -> Bool) -> [Double] -> Double
> sumFilter p = Prelude.sum . Prelude.filter (not . p)
> ```
>
> --
> Mateusz K.
> ___
> ghc-devs mailing list
> ghc-devs@haskell.org
> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
>



-- 
brandon s allbery kf8nh   sine nomine associates
allber...@gmail.com  ballb...@sinenomine.net
unix, openafs, kerberos, infrastructure, xmonadhttp://sinenomine.net
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs