On 03/06/2018 10:43 AM, Brandon Allbery wrote: > I'd in general expect good C code to optimize a little better; and in > particular, decomposing an IEEE float is almost certainly more expensive in > Haskell than in C, because unions let you cheat. (And I recall looking at > the implementation of decodeFloat once; it's significantly longer than that > C.) But I have to wonder if that code would be better done with something > more native; the implementation may be a portable default, and you might be > able to find something x86-specific that is faster.
There's a https://c9x.me/x86/html/file_module_x86_id_316.html that the ‘d /= d’ way compiles to. I suppose maybe I could just keep using that and fall back onto isDoubleNaN if __FAST_MATH__ is set… > On Tue, Mar 6, 2018 at 5:35 AM, Mateusz Kowalczyk <fuuze...@fuuzetsu.co.uk> > wrote: > >> On 03/05/2018 10:23 PM, Brandon Allbery wrote: >>> If the FFI version is done with "safe", consider using "unsafe" instead. >>> There are technical reasons why this is slightly incorrect, but unless >>> you're fiddling with the CPU's FP control flags they're mostly irrelevant >>> and you can treat isNaN as pure and non-side-effectful, significantly >>> reducing the overhead. You may also be able to use "ccall" to take >>> advantage of C compiler level optimizations, or simply to directly >> invoke a >>> CPU-based test with asm(); but you'll need to hide that in a C >> preprocessor >>> #define, so that it looks syntactically like a function call to the FFI. >>> >>> (One of the technical reasons is that various OSes have been known to >>> introduce bugs in their FP register and state handling across system >> calls, >>> in which case the "safe" version may turn "complete FP chaos" into merely >>> "wrong answer". It's your call whether, or which side, of this bothers >> you.) >> >> Perhaps I was a little unclear. The FFI-using isDoubleNaN is something >> GHC does! >> >> ``` >> libraries/base/GHC/Float.hs:foreign import ccall unsafe "isDoubleNaN" >> isDoubleNaN :: Double -> Int >> ``` >> ``` >> HsInt >> isDoubleNaN(HsDouble d) >> { >> union stg_ieee754_dbl u; >> >> u.d = d; >> >> return ( >> u.ieee.exponent == 2047 /* 2^11 - 1 */ && /* Is the exponent all >> ones? */ >> (u.ieee.mantissa0 != 0 || u.ieee.mantissa1 != 0) >> /* and the mantissa non-zero? */ >> ); >> } >> ``` >> >> My question is whether it could do better by not doing FFI and instead >> computing natively and if not, why not? >> >>> On Mon, Mar 5, 2018 at 4:53 PM, Mateusz Kowalczyk < >> fuuze...@fuuzetsu.co.uk> >>> wrote: >>> >>>> Hi, >>>> >>>> Recently at a client I was profiling some code and isDoubleNaN lit up. >>>> We were checking a lot of doubles for NaN as that's what customer would >>>> send in. >>>> >>>> I went to investigate and I found that FFI is used to achieve this. I >>>> was always under the impression that FFI costs a little. I had at the >>>> time replaced the code with a hack with great results: >>>> >>>> ``` >>>> isNaN' :: Double -> Bool >>>> isNaN' d = d /= d >>>> ``` >>>> >>>> While this worked and provided good speedup in my case, this fails >>>> catastrophically if the program is compiled with -ffast-math. This is >>>> expected. I have since reverted it. Seeking an alternative solution I >>>> have thought about re-implementing the C code with a native Haskell >>>> version: after all it just checks a few bits. Apparently unsafeCoerce# >>>> and friends were a big no-no but I found >>>> https://phabricator.haskell.org/D3358 . I have implemented the code at >>>> the bottom of this post. Obviously it's missing endianness (compile-time >>>> switch). >>>> >>>> This seems to be faster for smaller `mkInput` list than Prelude.isNaN >>>> but slower slightly on the one below. The `/=` version is the fastest >>>> but very fragile. >>>> >>>> My question to you all is whether implementing a version of this >>>> function in Haskell makes sense and if not, why not? The >>>> stgDoubleToWord64 is implemented in CMM and I don't know anything about >>>> the costs of that. >>>> >>>> * Is there a cheaper alternative to FFI way? >>>> * If yes, does anyone know how to write it such that it compiles to same >>>> code but without the call overhead? I must have failed below as it's >>>> slower on some inputs. >>>> >>>> Basically if a faster way exists for isNaN, something I have to do a >>>> lot, I'd love to hear about it. >>>> >>>> I leave you with basic code I managed to come up with. 8.4.x only. >>>> >>>> >>>> ``` >>>> {-# LANGUAGE MagicHash #-} >>>> {-# OPTIONS_GHC -O2 -ddump-simpl -ddump-stg -ddump-to-file -ddump-asm >> #-} >>>> module Main (main) where >>>> >>>> import GHC.Float >>>> import GHC.Prim >>>> >>>> isNaN' :: Double -> Bool >>>> isNaN' d = d /= d >>>> >>>> isNaNBits :: Double -> Bool >>>> isNaNBits (D# d) = case (bits `and#` expMask) `eqWord#` expMask of >>>> 1# -> case bits `and#` mantissaMask of >>>> 0## -> False >>>> _ -> True >>>> _ -> False >>>> where >>>> bits :: Word# >>>> bits = stgDoubleToWord64 d >>>> >>>> expMask, mantissaMask :: Word# >>>> expMask = 0x7FF0000000000000## >>>> mantissaMask = 0x000FFFFFFFFFFFFF## >>>> >>>> main :: IO () >>>> main = sumFilter isNaN {-isNaN'-} {-isNaNBits-} (mkInput 100000000) >>>> `seq` pure () >>>> where >>>> nan :: Double >>>> nan = log (-1) >>>> >>>> mkInput :: Int -> [Double] >>>> mkInput n = take n $ cycle [1, nan] >>>> >>>> sumFilter :: (Double -> Bool) -> [Double] -> Double >>>> sumFilter p = Prelude.sum . Prelude.filter (not . p) >>>> ``` >>>> >>>> -- >>>> Mateusz K. >>>> _______________________________________________ >>>> ghc-devs mailing list >>>> ghc-devs@haskell.org >>>> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs >>>> >>> >>> >>> >> >> >> -- >> Mateusz K. >> > > > -- Mateusz K. _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs