Re: Broken ghc-7.0.3/vector combination?
On 21/04/2011 12:29, Felipe Almeida Lessa wrote: On Thu, Apr 21, 2011 at 8:08 AM, Simon Marlowmarlo...@gmail.com wrote: Right, it could be related to this. However this change was made to eliminate some causes of NaNs, see: http://hackage.haskell.org/trac/ghc/ticket/4914 So I'm very depressed if it managed to introduce NaNs somehow. Could someone make a ticket for this, with the smallest test case found so far please? So in principle the LLVM backend should be fine? Yes, also compiling with -msse2 on x86 should be fine. Cheers, Simon ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Broken ghc-7.0.3/vector combination?
On 20/04/2011 18:28, Ian Lynagh wrote: On Wed, Apr 20, 2011 at 05:02:50PM +0200, Daniel Fischer wrote: So, is it possible that some change in ghc-7.0.3 vs. the previous versions Very little changed between 7.0.2 and 7.0.3. The only thing that jumps out to me as possibly being relevant is: diff -ur 7.0.2/ghc-7.0.2/compiler/nativeGen/X86/Instr.hs 7.0.3/ghc-7.0.3/compiler/nativeGen/X86/Instr.hs --- 7.0.2/ghc-7.0.2/compiler/nativeGen/X86/Instr.hs 2011-02-28 18:10:06.0 + +++ 7.0.3/ghc-7.0.3/compiler/nativeGen/X86/Instr.hs 2011-03-26 18:10:04.0 + @@ -734,6 +734,7 @@ where p insn r = case insn of CALL _ _ - GFREE : insn : r JMP _- GFREE : insn : r +JXX_GBL _ _ - GFREE : insn : r _- insn : r Right, it could be related to this. However this change was made to eliminate some causes of NaNs, see: http://hackage.haskell.org/trac/ghc/ticket/4914 So I'm very depressed if it managed to introduce NaNs somehow. Could someone make a ticket for this, with the smallest test case found so far please? Cheers, Simon ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Broken ghc-7.0.3/vector combination?
On Thu, Apr 21, 2011 at 8:08 AM, Simon Marlow marlo...@gmail.com wrote: Right, it could be related to this. However this change was made to eliminate some causes of NaNs, see: http://hackage.haskell.org/trac/ghc/ticket/4914 So I'm very depressed if it managed to introduce NaNs somehow. Could someone make a ticket for this, with the smallest test case found so far please? So in principle the LLVM backend should be fine? Thanks, -- Felipe. ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Broken ghc-7.0.3/vector combination?
I tried ghc --make -fforce-recomp simpleTest.hs with -O0 and -O1 and -O2 on OS X with 64-bit ghc-7.0.3 All versions ran without printing errors. ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Broken ghc-7.0.3/vector combination?
On Thursday 21 April 2011 17:18:47, Chris Kuklewicz wrote: I tried ghc --make -fforce-recomp simpleTest.hs with -O0 and -O1 and -O2 on OS X with 64-bit ghc-7.0.3 All versions ran without printing errors. I seem to recall that GHC produces sse2 code on x86_64. If that's correct, the effect probably won't be reproducible on that architecture, since it doesn't occur with -msse2 on x86 either (well, at least on my machine). ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Broken ghc-7.0.3/vector combination?
On Thu, Apr 21, 2011 at 10:43 AM, Daniel Fischer daniel.is.fisc...@googlemail.com wrote: On Thursday 21 April 2011 17:18:47, Chris Kuklewicz wrote: I tried ghc --make -fforce-recomp simpleTest.hs with -O0 and -O1 and -O2 on OS X with 64-bit ghc-7.0.3 All versions ran without printing errors. I seem to recall that GHC produces sse2 code on x86_64. If that's correct, the effect probably won't be reproducible on that architecture, since it doesn't occur with -msse2 on x86 either (well, at least on my machine). This is GHC 7.0.3 on Windows XP 32-bit: $ ghc --version The Glorious Glasgow Haskell Compilation System, version 7.0.3 $ ls cabal-dev bin cabal.config doc logs packages packages-7.0.3.conf primitive-0.3.1 vector-0.7.0.1 vector-algorithms-0.4 $ ./cabal-dev/bin/test.exe After sorting: 674 NaNs. ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Broken ghc-7.0.3/vector combination?
On Thursday 21 April 2011 13:08:22, Simon Marlow wrote: On 20/04/2011 18:28, Ian Lynagh wrote: On Wed, Apr 20, 2011 at 05:02:50PM +0200, Daniel Fischer wrote: So, is it possible that some change in ghc-7.0.3 vs. the previous versions Very little changed between 7.0.2 and 7.0.3. The only thing that jumps out to me as possibly being relevant is: diff -ur 7.0.2/ghc-7.0.2/compiler/nativeGen/X86/Instr.hs 7.0.3/ghc-7.0.3/compiler/nativeGen/X86/Instr.hs --- 7.0.2/ghc-7.0.2/compiler/nativeGen/X86/Instr.hs 2011-02-28 18:10:06.0 + +++ 7.0.3/ghc-7.0.3/compiler/nativeGen/X86/Instr.hs 2011-03-26 18:10:04.0 + @@ -734,6 +734,7 @@ where p insn r = case insn of CALL _ _ - GFREE : insn : r JMP _- GFREE : insn : r +JXX_GBL _ _ - GFREE : insn : r _- insn : r Right, it could be related to this. I'm afraid it is. Comparing the dumped asm, after renaming identifiers, the only difference between the assembly produced by 7.0.2 and 7.0.3 is the appearance of 59 ffree %st(0) ;ffree %st(1) ;ffree %st(2) ;ffree %st(3) ffree %st(4) ;ffree %st(5) in 7.0.3's code which aren't in 7.0.2's. However this change was made to eliminate some causes of NaNs, see: http://hackage.haskell.org/trac/ghc/ticket/4914 So I'm very depressed if it managed to introduce NaNs somehow. Could someone make a ticket for this, with the smallest test case found so far please? http://hackage.haskell.org/trac/ghc/ticket/5149 Cheers, Simon ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Broken ghc-7.0.3/vector combination?
Daniel Fischer wrote: Further investigation of the sorting code in vector-algorithms revealed no bugs there, and if the runtime was forced to keep a keen eye on the indices, by replacing unsafeRead/Write/Swap with their bounds-checked counterparts or by 'trace'ing enough of their uses, the NaNs did not appear. Did you replace them in vector-algorithms or in vector itself? So, is it possible that some change in ghc-7.0.3 vs. the previous versions caused a bad interaction between ghc-optimisations and vector fusion resulting in bad vector reads/writes? Am I right in assuming that this happens in code which uses only mutable vectors? Fusion only works for immutable ones so it shouldn't really affect things here. Have you tried playing around with code generation flags like -msse2? In any case, I would try to take a look at this if you tell me how to reproduce. Roman ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Broken ghc-7.0.3/vector combination?
On Wed, Apr 20, 2011 at 05:02:50PM +0200, Daniel Fischer wrote: So, is it possible that some change in ghc-7.0.3 vs. the previous versions Very little changed between 7.0.2 and 7.0.3. The only thing that jumps out to me as possibly being relevant is: diff -ur 7.0.2/ghc-7.0.2/compiler/nativeGen/X86/Instr.hs 7.0.3/ghc-7.0.3/compiler/nativeGen/X86/Instr.hs --- 7.0.2/ghc-7.0.2/compiler/nativeGen/X86/Instr.hs 2011-02-28 18:10:06.0 + +++ 7.0.3/ghc-7.0.3/compiler/nativeGen/X86/Instr.hs 2011-03-26 18:10:04.0 + @@ -734,6 +734,7 @@ where p insn r = case insn of CALL _ _ - GFREE : insn : r JMP _- GFREE : insn : r +JXX_GBL _ _ - GFREE : insn : r _- insn : r Thanks Ian ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Broken ghc-7.0.3/vector combination?
On Wednesday 20 April 2011 19:11:07, Roman Leshchinskiy wrote: Daniel Fischer wrote: Further investigation of the sorting code in vector-algorithms revealed no bugs there, and if the runtime was forced to keep a keen eye on the indices, by replacing unsafeRead/Write/Swap with their bounds-checked counterparts or by 'trace'ing enough of their uses, the NaNs did not appear. Did you replace them in vector-algorithms or in vector itself? vector-algorithms only. So, is it possible that some change in ghc-7.0.3 vs. the previous versions caused a bad interaction between ghc-optimisations and vector fusion resulting in bad vector reads/writes? Am I right in assuming that this happens in code which uses only mutable vectors? Yes, the sorting uses mutable vectors, in this case unboxed Double vectors. Fusion only works for immutable ones so it shouldn't really affect things here. Ah, didn't know that. Another suspect gone. Have you tried playing around with code generation flags like -msse2? No, not yet. So far only -O2 (with -fspec-constr-count=5 in the presence of many trace calls) and -O0. In any case, I would try to take a look at this if you tell me how to reproduce. I'll prepare a bundle, I'm afraid it won't be small, though. And it might be architecture dependent, so I can't guarantee that you will be able to reproduce it. But Bryan said on IRC yesterday that others have reported similar issues with criterion output, so it may well be cross-platform reproducible. Cheers, Daniel ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Broken ghc-7.0.3/vector combination?
On Wednesday 20 April 2011 20:25:34, Bryan O'Sullivan wrote: On Wed, Apr 20, 2011 at 10:44 AM, Daniel Fischer daniel.is.fisc...@googlemail.com wrote: I'll prepare a bundle, I'm afraid it won't be small, though. And it might be architecture dependent, so I can't guarantee that you will be able to reproduce it. But Bryan said on IRC yesterday that others have reported similar issues with criterion output, so it may well be cross-platform reproducible. Daniel, are you sure this is down to a 7.0.2/7.0.3 difference, and not perhaps due to just a bug in criterion itself? I'm sure it's not criterion, because after I've found that NaNs were introduced to the resamples vectors during sorting (check the entire vectors for NaNs before and aftersorting, tracing the count; before: 0, afterwards often quite a number, sometimes close to 10%), the further tests didn't involve criterion anymore. criterion is simply the most obvious place to see the NaNs show up (with 5-10% NaNs among the resamples, it won't take too long to see one pop up). It could be a bug in statistics, but I'm pretty sure this one's not due to statistics either, since fiddling with vector-algorithms made the NaNs disappear - btw., Bryan, using the heap sort instead of introsort, I haven't found any NaNs in my tests, so temporarily switching the algorithm might cure the symptoms. Dan Doel and I spent not too little time scrutinising the vector-algorithms code without finding an issue. Also, replacing the unsafe access with bounds-checked access (apparently) eliminated the NaNs, and 7.0.1 and 7.0.2 didn't produce any in my tests, yet more points to believe that it's none of these packages producing the behaviour, but rather something that changed between 7.0.2 and 7.0.3 -- however, so far in this matter my guesses as to what's responsible have been wrong, so I wouldn't be surprised if it's something entirely different. ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Broken ghc-7.0.3/vector combination?
On Wed, Apr 20, 2011 at 3:01 PM, Daniel Fischer daniel.is.fisc...@googlemail.com wrote: I'm sure it's not criterion, because after I've found that NaNs were introduced to the resamples vectors during sorting (check the entire vectors for NaNs before and aftersorting, tracing the count; before: 0, afterwards often quite a number, sometimes close to 10%), the further tests didn't involve criterion anymore. criterion is simply the most obvious place to see the NaNs show up (with 5-10% NaNs among the resamples, it won't take too long to see one pop up). It could be a bug in statistics, but I'm pretty sure this one's not due to statistics either, since fiddling with vector-algorithms made the NaNs disappear - btw., Bryan, using the heap sort instead of introsort, I haven't found any NaNs in my tests, so temporarily switching the algorithm might cure the symptoms. It's not a statistics bug. I'm reproducing it here using just vector-algorithms. Fill a vector of size N with [N..1], and (intro) sort it, and you get NaNs. But only with -O or above. Without optimization it doesn't happen (and nothing seems to be reading/writing out of bounds, as I compiled vector with UnsafeChecks earlier and it didn't complain). Filling the vector with [1..N] also doesn't trigger the NaNs. [0,0..0] and [0,0..1] trigger it. I don't know what's going on yet. I have trouble believing it's a bug in vector-algorithms code, though, as I don't think I've written any RULEs (just INLINEs), and that's the one thing that comes to mind in library code that could cause a difference between -O0 and -O. So I'd tentatively suggest it's a vector, base or compiler bug. The above testing is on 64-bit windows running a 32-bit copy of GHC, for reference. My ability to investigate this will be a bit limited for the near future. If someone definitively tracks it down to bugs in my code, though, let me know, and I'll try and push a new release up on hackage. -- Dan ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Broken ghc-7.0.3/vector combination?
On Wednesday 20 April 2011 21:55:51, Dan Doel wrote: It's not a statistics bug. I'm reproducing it here using just vector-algorithms. Yep. Attached a simple testcasewhich reproduces it and uses only vector and vector-algorithms. Fill a vector of size N with [N..1], and (intro) sort it, and you get NaNs. But only with -O or above. However, for me the NaNs disappear with the -msse2 option. Without optimization it doesn't happen (and nothing seems to be reading/writing out of bounds, as I compiled vector with UnsafeChecks earlier and it didn't complain). Nor does it happen here with 7.0.2 or 7.0.1. Filling the vector with [1..N] also doesn't trigger the NaNs. [0,0..0] and [0,0..1] trigger it. I don't know what's going on yet. I have trouble believing it's a bug in vector-algorithms code, though, as I don't think I've written any RULEs (just INLINEs), and that's the one thing that comes to mind in library code that could cause a difference between -O0 and -O. So I'd tentatively suggest it's a vector, base or compiler bug. The above testing is on 64-bit windows running a 32-bit copy of GHC, for reference. 32-bit linux here My ability to investigate this will be a bit limited for the near future. If someone definitively tracks it down to bugs in my code, though, let me know, and I'll try and push a new release up on hackage. -- Dan {-# LANGUAGE BangPatterns #-} module Main where import qualified Data.Vector.Unboxed.Mutable as MU import Data.Vector.Unboxed.Mutable (IOVector, unsafeRead, unsafeWrite, new) import qualified Data.Vector.Algorithms.Intro as I import Control.Monad (when) import System.Environment (getArgs) countNaNs :: IOVector Double - IO Int countNaNs a = go 0 0 where len = MU.length a go !ct i | i len = do x - unsafeRead a i go (if isNaN x then ct+1 else ct) (i+1) | otherwise = return ct sample :: Int - IO (IOVector Double) sample k = do a - new k let foo :: Double - Double foo x = 1.0 + sin x / x fill i x | i k = do unsafeWrite a i (foo x) fill (i+1) (x+1.0) | otherwise = return a fill 0 (fromIntegral k * 10) main :: IO () main = do args - getArgs let k = case args of (arg:_) - read arg _ - 1 a - sample k b - countNaNs a when (b /= 0) (putStrLn $ Before sorting: ++ show b ++ NaNs.) I.sort a c - countNaNs a when (c /= 0) (putStrLn $ After sorting: ++ show c ++ NaNs.) ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users