Re: [Haskell-cafe] OCaml list sees abysmal Language Shootout results
Ketil Malde wrote: > Ketil Malde <[EMAIL PROTECTED]> writes: > > > To get memory consumption down, I tried a strict "update" function: > > >update k fm = let x = (get hash1 k + get fm k) > > in x `seq` addToFM fm k x > > > which slowed the program down(!), Yes that fixes(?) it. The strict update removes the space leak and makes the FiniteMap perform as expected. > > I wonder if this isn't due to never evaluating the values for > "foo_2" to "foo_9998" because of laziness? Maybe. On a whim I thought that maybe unevaluated addition thunks from addToFM_C were piling up, so I changed to addListToFM_C instead... let update k fm = addListToFM_C (+) fm $ replicate (read n) (k,gethash1 k) let res = foldr update hash2 keys ...but that hardly made a difference. So the search continues. Greg Buchholz ___ Haskell-Cafe mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] OCaml list sees abysmal Language Shootout results
Ketil Malde <[EMAIL PROTECTED]> writes: > To get memory consumption down, I tried a strict "update" function: >update k fm = let x = (get hash1 k + get fm k) > in x `seq` addToFM fm k x > which slowed the program down(!), I wonder if this isn't due to never evaluating the values for "foo_2" to "foo_9998" because of laziness? > BTW, I looked at the shootout web pages, but I couldn't find the > specification for any of the benchmarks. What is and isn't allowed? For instance, changing the order of of the updates shaves another 10-20% off the time (because of cache-friendliness, I suppose): - let res = foldr update hash2 (concat $ replicate (read n) keys) + let res = foldr update hash2 (concat $ map (replicate (read n)) keys) -kzm -- If I haven't seen further, it is by standing in the footprints of giants ___ Haskell-Cafe mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] OCaml list sees abysmal Language Shootout results
Greg Buchholz <[EMAIL PROTECTED]> writes: > I've been looking at the other shootout results (with the hope of > learning something about making haskell programs faster/less memory > hungry) and I couldn't quite figure out why the "Hashes, part II" test > comsumes so much memory ( http://shootout.alioth.debian.org/bench/hash2/ ). > So I started to try heap profiling, and generated the following graphs > for the different types of profiles... > biography => http://sleepingsquirrel.org/haskell/hash2_b.ps > retainer => http://sleepingsquirrel.org/haskell/hash2_r.ps > closure => http://sleepingsquirrel.org/haskell/hash2_d.ps > type => http://sleepingsquirrel.org/haskell/hash2_y.ps > cost cntr => http://sleepingsquirrel.org/haskell/hash2_c.ps > > ...but I have a hard time figuring out how to prevent something like > "stg_ap_3_upd_info" or "void" cells from consuming so much memory. One thing you could do, is to move the pure definitions (constants and functions) out of the monad. This will make them separate cost centres, with their own profile information. I toyed with this, but admittedly, it didn't change much in this case. I think it is better style, though. A simple way to improve speed marginally, is to specify Int instead of letting things default to Integer. A more complex way, saving about 60% of the time, is to use unboxed arrays instead of strings for the keys - memory consumption seems to be the same, though. To get memory consumption down, I tried a strict "update" function: update k fm = let x = (get hash1 k + get fm k) in x `seq` addToFM fm k x which slowed the program down(!), but reduced memory consumption from about 25Mb to 1.5Mb. So it seems that the memory consumption is due to unevaluated values in the FMs. BTW, I looked at the shootout web pages, but I couldn't find the specification for any of the benchmarks. What is and isn't allowed? -kzm import System (getArgs) import Data.FiniteMap import Data.Array.Unboxed import Maybe type Key = UArray Int Char type Map = FiniteMap (UArray Int Char) Int hash1, hash2 :: Map hash1 = listToFM $ zip keys [0..] hash2 = listToFM $ zip keys (repeat 0) keys :: [Key] keys = map (\x -> listArray (1,4+length (show x)) ("foo_" ++ show x)) [0..] get :: Map -> Key -> Int get fm k = fromJust $ lookupFM fm k update :: Key -> Map -> Map update k fm = let x = (get hash1 k + get fm k) in x `seq` addToFM fm k x foo_1 = keys!!1 foo_ = keys!! main = do [n] <- getArgs let res = foldr update hash2 (concat $ replicate (read n) keys) putStrLn $ unwords $ map show [get hash1 foo_1, get hash1 foo_, get res foo_1, get res foo_] -- If I haven't seen further, it is by standing in the footprints of giants ___ Haskell-Cafe mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] OCaml list sees abysmal Language Shootout results
Keith Wansbrough wrote: > I just saw this on the OCaml list (in a posting by "Rafael 'Dido' > Sevilla" <[EMAIL PROTECTED]> in the "Observations on OCaml vs. Haskell" > thread). I can't believe that a simple "wc" implementation should be > 570 times slower in Haskell than OCaml - could someone investigate and > fix the test? I've been looking at the other shootout results (with the hope of learning something about making haskell programs faster/less memory hungry) and I couldn't quite figure out why the "Hashes, part II" test comsumes so much memory ( http://shootout.alioth.debian.org/bench/hash2/ ). So I started to try heap profiling, and generated the following graphs for the different types of profiles... biography => http://sleepingsquirrel.org/haskell/hash2_b.ps retainer => http://sleepingsquirrel.org/haskell/hash2_r.ps closure => http://sleepingsquirrel.org/haskell/hash2_d.ps type => http://sleepingsquirrel.org/haskell/hash2_y.ps cost cntr => http://sleepingsquirrel.org/haskell/hash2_c.ps ...but I have a hard time figuring out how to prevent something like "stg_ap_3_upd_info" or "void" cells from consuming so much memory. Anyone have pointers on how to best use the profile information? I'm still trying to digest "Heap Profiling for Space Efficiency" http://portal.acm.org/citation.cfm?id=734156 Are there any other related papers out there? (Of course it might be the case that I need a FiniteMap tutorial) Here's the code in question... import System (getArgs) import Data.FiniteMap main = do [n] <- getArgs let get fm k = lookupWithDefaultFM fm 0 k let keys = map (\x -> "foo_" ++ show x) [0..] let hash1 = listToFM $ zip keys [0..] let hash2 = listToFM $ zip keys (repeat 0) let update k fm = addToFM_C (+) fm k (get hash1 k) let res = foldr update hash2 (concat $ replicate (read n) keys) putStrLn $ unwords $ map show [get hash1 "foo_1", get hash1 "foo_", get res "foo_1", get res "foo_"] Thanks, Greg Buchholz ___ Haskell-Cafe mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] OCaml list sees abysmal Language Shootout results
Ketil Malde <[EMAIL PROTECTED]> writes: >> wc :: !(Int,Int,Int) -> Char -> (Int, Int, Int) > I'm not sure if that was your question Sorry about that, brain malfunction, bangs are for data declarations, I'll get that cup of coffee now. I guess what you really want to do, is to put some `seq`s in there. Something like: wc (cs,ws,ls) ... = cs `seq` ws `seq` ls `seq` ...the def. of wc... which evaluates the Ints before doing anything else. Or use (!$) (like function application ($), but strict). -kzm -- If I haven't seen further, it is by standing in the footprints of giants ___ Haskell-Cafe mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] OCaml list sees abysmal Language Shootout results
Georg Martius wrote: > Some more general comment: The code for the shootout doesn't need to be > extremly fast in my eyes, it needs to be elegant and reasonable at > performance and memory consuptions (In this order). I don't want to say > that Thomaszs solution is bad, but it is not a typical Haskell style > application. If someone (not haskeller) looks at the implementation it > should be very obvious and clear. It might also be nice if the code would run under the other haskell compliers like Hugs and nhc98 right out-of-the-box. Greg Buchholz ___ Haskell-Cafe mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] OCaml list sees abysmal Language Shootout results
Andrew Cheadle wrote: > > +RTS -Sstderr -RTS and +RTS -sstderr -RTS will probably indicate why. > I'd be surprised if the amount of data copied for the semi-space > collector isn't much less than for the generational. Ahh. Data copied with '-G1' = 58MB vs. 203MB without. For posterities sake, here are the numbers... With '-G1' --- 306,616,872 bytes allocated in the heap 58,844,344 bytes copied during GC 99,316 bytes maximum residency (1169 sample(s)) 1169 collections in generation 0 ( 0.62s) 1 Mb total memory in use INIT time0.00s ( 0.00s elapsed) MUT time0.68s ( 0.71s elapsed) GCtime0.62s ( 0.68s elapsed) EXIT time0.00s ( 0.00s elapsed) Total time1.30s ( 1.39s elapsed) %GC time 47.7% (48.9% elapsed) Alloc rate450,907,164 bytes per MUT second Productivity 52.3% of total user, 48.9% of total elapsed Without --- 306,616,872 bytes allocated in the heap 203,339,812 bytes copied during GC 109,088 bytes maximum residency (131 sample(s)) 1169 collections in generation 0 ( 2.22s) 131 collections in generation 1 ( 0.05s) 2 Mb total memory in use INIT time0.00s ( 0.00s elapsed) MUT time0.79s ( 0.92s elapsed) GCtime2.27s ( 2.23s elapsed) EXIT time0.00s ( 0.00s elapsed) Total time3.06s ( 3.15s elapsed) %GC time 74.2% (70.8% elapsed) Alloc rate388,122,622 bytes per MUT second Productivity 25.8% of total user, 25.1% of total elapsed Greg Buchholz ___ Haskell-Cafe mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] OCaml list sees abysmal Language Shootout results
Hi Greg >Anyone have an explaination for the 2x speed >increase for running Kevin's version with '+RTS -G1'? +RTS -Sstderr -RTS and +RTS -sstderr -RTS will probably indicate why. I'd be surprised if the amount of data copied for the semi-space collector isn't much less than for the generational. Chances are that data is dying off very quickly and very little is being copied for the semi-space collector - the allocation area has a variable sizing policy and this size of allocation area is sufficient for the majority of the data to die off before it is filled and a GC kicks in - hence very little is copied. However, for the generational collector, the nursery is of fixed size. Here, the lifetime of the data is probably longer than the time it takes for the nursery to be filled and a minor GC kicks in and the data promoted to generation 1 (hence copied) where it then probably dies off but can't be collected until a major collection kicks in. Probably, or something like that ;-) Cheers Andy * * Andrew Cheadleemail: [EMAIL PROTECTED] * * Department of Computing http://www.doc.ic.ac.uk/~amc4/ * * Imperial College * * University of London * * ___ Haskell-Cafe mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] OCaml list sees abysmal Language Shootout results
Malcolm Wallace wrote: > Here are the results, in seconds, on my machine (2.4GHz x86/Linux) > for the suggested input (N=500) from the shootout site. All Haskell > versions were compiled with ghc-5.04.2 -O2. I thought I'd take a stab at timing a few of the examples with different compiler versions to see what difference that would make (ghc-6.2.1 vs. ghc-6.3.20040928). I compared Kevin Everets' version with the two Tomasz Zielonka versions. I ran the test with N=2500 (i.e. 2500 copies of the original file, which is what is apparently used in the shootout) on my AthlonXP 1600 under x86/Linux. 6.2.1 6.3.20040928 --- --- Kevin 3.615s 3.156s Kevin (+RTS -G1) 1.666s 1.405s Tomasz (pretty) 0.725s 0.481s Tomasz (fast) 0.403s 0.430s Interesting to see the speed increase going from 6.2.1 to 6.3 for Tomasz' pretty example. Anyone have an explaination for the 2x speed increase for running Kevin's version with '+RTS -G1'? (And for reference, here's the results on my machine for the perl and gcc versions of the test and gnu/wc) perl-5.8.4 0.535s gcc-3.4.2 0.102s gnu/wc 0.435s Greg Buchholz ___ Haskell-Cafe mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] OCaml list sees abysmal Language Shootout results
Hi folks, On Thu, 30 Sep 2004 01:02:54 +0100, Sam Mason <[EMAIL PROTECTED]> wrote: Greg Buchholz wrote: The algorithm isn't correct (it counts spaces instead of words), but anyone have advice for improving its performance? You probably want some strictness annotations in there. . . Last night as I have tried to improve Gregs wc in a simple fashion and came up with the same idea to make a new data type with strict fields. I thought why one couldn't add some kind of strictness annotation to the function type. First attempt: wc :: !(Int,Int,Int) -> Char -> (Int, Int, Int) As far as I know the compiler does strictness analysis to find strict arguments in a function anyway. Would it make sense to allow this kind of explicit strictness? Where are the problems with that? I mean lazyness is really useful and it is our best friend in this kind of application, since we can make stream processing without implementing a buffer and so on. On the other hand one gets occasionally traped by it and it is not allways easy to grasp why. Some more general comment: The code for the shootout doesn't need to be extremly fast in my eyes, it needs to be elegant and reasonable at performance and memory consuptions (In this order). I don't want to say that Thomaszs solution is bad, but it is not a typical Haskell style application. If someone (not haskeller) looks at the implementation it should be very obvious and clear. The last few weeks the list have been full of performance questions (including my own ones) and it is really a pitty that it is such an issue. I believe that most problems occuring with performance and memory consumptions could be easily solved by partial and explicit strictness. Please enlight me if I am wrong. Regards, Georg ___ Haskell-Cafe mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] OCaml list sees abysmal Language Shootout results
At 19:39 30/09/04 +0200, Tomasz Zielonka wrote: What I like about GHC is that I can start from simple, high-level, sometimes slow solutions, but if there are efficiency problems, there is a big chance that I can solve them without switching the language. That's a very good point, I think. One to hang on to. > But I did wonder if it wouldn't be possible to also abstract out the I/O > element of your 'fileIterate', using instead something like: > streamIterate :: [b] -> (a -> b -> a) -> a -> IO a It seems to be a variant of foldl. You can eliminate IO from return type, or is there some reason for it? Doh! (Why didn't I spot that?) All roads lead to Rome, or something like that? There seems to be a recurring tension between how much to specialize and how much to generalize. Maybe it should be something like: streamIterate :: (Monad m) => [b] -> (a -> b -> m a) -> a -> m a ? Er, but that's similarly a variation of foldM, right? Or maybe my earlier idea was closer: streamIterate :: (Monad m) => (c -> m (b,c)) -> c -> (a -> b -> m a) -> a -> m a ? Hmmm... I feel like a (intellectual) bull-in-a-china-shop here. I'm blundering about on the trail of a delicate and elegant idea that I'm sure others could dissect far more clearly. What I'm trying to capture (I think) is that there's some baggage to do with accessing the raw data and emitting the desired result that needs to be carried along (or interleaved) with the required computation on that data. Does this make any sense, or am I descending into farce here? #g Graham Klyne For email: http://www.ninebynine.org/#Contact ___ Haskell-Cafe mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] OCaml list sees abysmal Language Shootout results
On Thu, Sep 30, 2004 at 05:40:58PM +0100, Graham Klyne wrote: > 2. Your fileIterator certainly looks nicer (to me) than your other > solution, but... It looks nicer to me too. > Tagging along with this debate, I found myself wondering if, in order to > get performance comparable to other languages, it is really necessary to > write code like code in other languages. Maybe it's not necessary to get the comparable performance, but it often is if you want to get the best performance possible. Of course I wouldn't mind if GHC optimised high level code so well, that we wouldn't have to do it, but I guess it's just not that easy. What I like about GHC is that I can start from simple, high-level, sometimes slow solutions, but if there are efficiency problems, there is a big chance that I can solve them without switching the language. > But I did wonder if it wouldn't be possible to also abstract out the I/O > element of your 'fileIterate', using instead something like: > streamIterate :: [b] -> (a -> b -> a) -> a -> IO a It seems to be a variant of foldl. You can eliminate IO from return type, or is there some reason for it? Best regards, Tom -- .signature: Too many levels of symbolic links ___ Haskell-Cafe mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] OCaml list sees abysmal Language Shootout results
At 16:56 30/09/04 +0200, Tomasz Zielonka wrote: Then how about a solution like this: I took your program but used my fast fileIterate instead of ,,foldl over getContents''. I also added {-# OPTIONS -funbox-strict-fields #-}, and played a bit to get the best optimisations from GHC. It's about 7 times faster this way, but it's still two times slower than the solution I sent to shootout. Devilish plan: Maybe we could have some variants of fileIterate in GHC's libraries? ;-> Two responses: 1. I agree that providing the right kind of library functions (and material explaining how to use them) maybe a key to getting efficient code without losing high-level forms of expression. 2. Your fileIterator certainly looks nicer (to me) than your other solution, but... Tagging along with this debate, I found myself wondering if, in order to get performance comparable to other languages, it is really necessary to write code like code in other languages. E.g., I thought one of the lessons of John Hughes' "Why functional Programming matters" was that one can achieve greater efficiencies by climbing higher rather than dropping down to the level of other languages. Your fileIterate looks to me like a step in the right direction. But I did wonder if it wouldn't be possible to also abstract out the I/O element of your 'fileIterate', using instead something like: streamIterate :: [b] -> (a -> b -> a) -> a -> IO a (I was originally thinking of something like: streamIterate :: (c -> (b,c)) -> c -> (a -> b -> a) -> a -> IO a where the first argument is a function that takes a sequence generator and returns the next member of the sequence+new generator, and the 2nd arg is the initial generator.) For such an approach to be useful, I think it would also be important to have variations of functions like length, lines, words that can be combined to make a function like your wc'. #g Graham Klyne For email: http://www.ninebynine.org/#Contact ___ Haskell-Cafe mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] OCaml list sees abysmal Language Shootout results]
Sam Mason wrote: > > You probably want some strictness annotations in there. . . Now we're getting somewhere. When I replace the tuples with my own (strict) data structure, it gets about 7.5 times faster than the original shootout example (or about 24 times faster than the version with tuples). I get another 2x speedup when I pass '+RTS -G1' to the executable. So the version below is about 15 times faster than the original using the 3MB data file from the shootout. (Now we're only about 40x slower than ocaml). Don't forget to turn on '-funbox-strict-fields' for an additional improvement. -- Compile with... -- ghc -O2 -ddump-simpl -fvia-c -funbox-strict-fields -o wc_fold2 wc_fold2.hs -- execute as... ./wc_fold2 +RTS -G1 http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] OCaml list sees abysmal Language Shootout results
On Thu, Sep 30, 2004 at 09:49:46AM -0400, Kevin Everets wrote: > I took Georg's, fixed the word count logic and added prettier > printing, and then combined it with Sam's main (which I find more > elegant, but others may find less straightforward). I think it > strikes a good balance between efficiency and elegance. Then how about a solution like this: I took your program but used my fast fileIterate instead of ,,foldl over getContents''. I also added {-# OPTIONS -funbox-strict-fields #-}, and played a bit to get the best optimisations from GHC. It's about 7 times faster this way, but it's still two times slower than the solution I sent to shootout. Devilish plan: Maybe we could have some variants of fileIterate in GHC's libraries? ;-> I remember that someone proposed similar functions on haskell's lists some time ago, but can't remember who. Best regards, Tom -- .signature: Too many levels of symbolic links {-# OPTIONS -funbox-strict-fields #-} import System.IO import Data.Array.IO import Data.Array.Base import Data.Word import Data.Int import List import Char main = fileIterate stdin wc' (C 0 0 0 False) >>= putStrLn . showC data C = C !Int !Int !Int !Bool deriving Show -- Line Word Char InWord showC (C l w c _) = show l ++ " " ++ show w ++ " " ++ show c wc' :: C -> Char -> C wc' (C l w c _) '\n' = C (l+1) w (c+1) False wc' (C l w c _) ' ' = C l w (c+1) False wc' (C l w c _) '\t' = C l w (c+1) False wc' (C l w c False) _= C l (w+1) (c+1) True wc' (C l w c True) _= C l w (c+1) True {-# INLINE fileIterate #-} fileIterate :: Handle -> (a -> Char -> a) -> a -> IO a fileIterate h f a0 = do buf <- newArray_ (0, bufSize - 1) :: IO (IOUArray Int Word8) let loop i n a | i `seq` n `seq` a `seq` False = undefined | i == n = do n' <- hGetArray h buf bufSize if n' == 0 then return a else loop 0 n' a | otherwise = do c <- fmap (toEnum . fromEnum) (readArray buf i) loop (i + 1) n (f a c) loop 0 0 a0 where bufSize :: Int bufSize = 4096 ___ Haskell-Cafe mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] OCaml list sees abysmal Language Shootout results
On Thu, Sep 30, 2004 at 11:26:15AM +0100, Malcolm Wallace wrote: > Just out of interest, I ran all of these suggested variations of > the word count solution in Haskell head-to-head against each other. > Here are the results, in seconds, on my machine (2.4GHz x86/Linux) > for the suggested input (N=500) from the shootout site. All Haskell > versions were compiled with ghc-5.04.2 -O2. > > original space-leaky2.257 > Greg Buchholz 1.619 * > Sam Mason 0.594 > Malcolm Wallace 0.457 > Georg Martius 0.322 * > Tomasz Zielonka 0.047 > linux 'wc' 0.085 > > Those marked with a * gave the wrong number of words. The really > interesting thing is that Tomasz's solution is twice as fast as the > standard Gnu implementation! I took Georg's, fixed the word count logic and added prettier printing, and then combined it with Sam's main (which I find more elegant, but others may find less straightforward). I think it strikes a good balance between efficiency and elegance. Cheers, Kevin. -- import IO main = getContents >>= putStrLn . showC . foldl wc' (C 0 0 0 False) data C = C !Int !Int !Int !Bool deriving Show -- Line Word Char InWord showC (C l w c _) = show l ++ " " ++ show w ++ " " ++ show c wc' :: C -> Char -> C wc' (C l w c _) '\n' = C (l+1) w (c+1) False wc' (C l w c _) ' ' = C l w (c+1) False wc' (C l w c _) '\t' = C l w (c+1) False wc' (C l w c False) _= C l (w+1) (c+1) True wc' (C l w c True) _= C l w (c+1) True ___ Haskell-Cafe mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] OCaml list sees abysmal Language Shootout results
On Thu, Sep 30, 2004 at 11:26:15AM +0100, Malcolm Wallace wrote: > Those marked with a * gave the wrong number of words. The really > interesting thing is that Tomasz's solution is twice as fast as the > standard Gnu implementation! That's probably because Gnu wc is locale aware. Best regards, Tom -- .signature: Too many levels of symbolic links ___ Haskell-Cafe mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] OCaml list sees abysmal Language Shootout results
Just out of interest, I ran all of these suggested variations of the word count solution in Haskell head-to-head against each other. Here are the results, in seconds, on my machine (2.4GHz x86/Linux) for the suggested input (N=500) from the shootout site. All Haskell versions were compiled with ghc-5.04.2 -O2. original space-leaky2.257 Greg Buchholz 1.619 * Sam Mason 0.594 Malcolm Wallace 0.457 Georg Martius 0.322 * Tomasz Zielonka 0.047 linux 'wc' 0.085 Those marked with a * gave the wrong number of words. The really interesting thing is that Tomasz's solution is twice as fast as the standard Gnu implementation! Regards, Malcolm ___ Haskell-Cafe mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] OCaml list sees abysmal Language Shootout results
On Wed, 29 Sep 2004 15:58:24 -0700, Greg Buchholz <[EMAIL PROTECTED]> wrote: Just for the heck of it, I'd thought I'd try to write a naive 1-pass version of the program. It turned out to be 4.5 times slower than the original... -- compiled with: ghc -O2 -ddump-simpl -fvia-c -o wc_fold wc_fold.hs import IO main = do file <- getContents putStrLn (show (foldl wc (0,0,0) file)) wc :: (Int,Int,Int) -> Char -> (Int, Int, Int) wc (l,w,c) '\n' = (l+1,w ,c+1) wc (l,w,c) ' ' = (l ,w+1,c+1) wc (l,w,c) x = (l ,w ,c+1) use strictness is the trick to exploit the tail-recursion. There is possibly a better way but: data C = C !Int !Int !Int deriving Show wc' :: C -> Char -> C wc' (C l w c) '\n' = C (l+1) w (c+1) wc' (C l w c) ' ' = C l (w+1) (c+1) wc' (C l w c) x = C l w (c+1) is significant faster. The results on a 12MB file: original: 1,699,541,396 bytes allocated in the heap 340,796,888 bytes copied during GC 75,415,872 bytes maximum residency (8 sample(s)) 146 Mb total memory in use Total time5.92s ( 6.09s elapsed) wc: (yours) I have not enough memory :-( wc': 535,286,872 bytes allocated in the heap 187,340,032 bytes copied during GC 135,696 bytes maximum residency (130 sample(s)) 2 Mb total memory in use Total time2.45s ( 2.53s elapsed) Cheers, Georg -- Georg Martius, Tel: (+49 34297) 89434 --- http://www.flexman.homeip.net - ___ Haskell-Cafe mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] OCaml list sees abysmal Language Shootout results
Greg Buchholz wrote: >The algorithm isn't correct (it counts spaces instead of words), but >anyone have advice for improving its performance? You probably want some strictness annotations in there. . . When I tried the same thing, I came up with something like: > import Char; > > cclass c | isSpace c = (c == '\n', False) > | otherwise = (False, True) > > data Cdata = Cdata !Bool !Int !Int !Int > deriving Show > > combine (Cdata last l w c) (nl, iw) = Cdata iw l' w' (c + 1) > where l' = if nl then l + 1 else l > w' = if not last && iw then w + 1 else w > > wc = foldl combine (Cdata False 0 0 0) . map cclass > > main = getContents >>= putStrLn . show . wc It seems to work in constant stack space, and gives the same answers (albeit not very beautifully) as my GNU copy of "wc". >Is the problem >caused by the laziness of the Int's inside the tuple? I'm pretty sure that's what's causing it. I had quite a search around when my version was running out of memory and everything seemed to suggest that, basically, the algorithm is building a massive list of "+1"s that only actually get executed when the you try and print the totals at the end. Any comments from more authoritative sources? >Here is the >report from ghc with the '-ddump-simpl' option. If anyone has any hints about how to read this output, I'd be grateful. It makes a bit of sense, but I don't really know what it "means". I.e. it's obviously the simplified parse tree and I can see how it relates back to the source (loosely), but attempting to figure out if something's going to be as leaky as a sieve or not is beyond me. ___ Haskell-Cafe mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] OCaml list sees abysmal Language Shootout results
Malcolm Wallace wrote: > Graham Klyne <[EMAIL PROTECTED]> writes: > > I can see that this requires the original file to be kept for 3-time > > scanning, so enough memory for the entire file will be required. Is that > > *the* problem to which you allude? I can't see any other problem > > here. > > Yes, it is the main problem. > In any case, for the shootout, this is patently a different algorithm > to the one every other solution uses. All the other languages do a > simple one-pass traversal of the file, in constant memory space. Why > artificially handicap Haskell by insisting it do the job naively? Just for the heck of it, I'd thought I'd try to write a naive 1-pass version of the program. It turned out to be 4.5 times slower than the original... -- compiled with: ghc -O2 -ddump-simpl -fvia-c -o wc_fold wc_fold.hs import IO main = do file <- getContents putStrLn (show (foldl wc (0,0,0) file)) wc :: (Int,Int,Int) -> Char -> (Int, Int, Int) wc (l,w,c) '\n' = (l+1,w ,c+1) wc (l,w,c) ' ' = (l ,w+1,c+1) wc (l,w,c) x = (l ,w ,c+1) The algorithm isn't correct (it counts spaces instead of words), but anyone have advice for improving its performance? Is the problem caused by the laziness of the Int's inside the tuple? Here is the report from ghc with the '-ddump-simpl' option. Main.wc :: (GHC.Base.Int, GHC.Base.Int, GHC.Base.Int) -> GHC.Base.Char -> (GHC.Base.Int, GHC.Base.Int, GHC.Base.Int) [GlobalId] Arity 2 __P Main.$wwc NoCafRefs Str: DmdType U(LLL)U(L)m Main.wc = __inline_me (\ w :: (GHC.Base.Int, GHC.Base.Int, GHC.Base.Int) w1 :: GHC.Base.Char -> case w of w2 { (ww, ww1, ww2) -> case w1 of w3 { GHC.Base.C# ww3 -> case Main.$wwc ww ww1 ww2 ww3 of ww4 { (# ww5, ww6, ww7 #) -> (ww5, ww6, ww7) } } }) Rec { Main.$wlgo :: GHC.Base.Int -> GHC.Base.Int -> GHC.Base.Int -> [GHC.Base.Char] -> (# GHC.Base.Int, GHC.Base.Int, GHC.Base.Int #) And here is the memory allocation report generated from passing '+RTS -M900m -k250m -sstderr' to the executable... 875,150,472 bytes allocated in the heap 149,008,464 bytes copied during GC 250,845,060 bytes maximum residency (2 sample(s)) 478 collections in generation 0 ( 10.44s) 2 collections in generation 1 ( 0.00s) 813 Mb total memory in use INIT time0.00s ( 0.00s elapsed) MUT time0.93s ( 1.03s elapsed) GCtime 10.44s ( 11.38s elapsed) EXIT time0.00s ( 0.00s elapsed) Total time 11.37s ( 12.41s elapsed) %GC time 91.8% (91.7% elapsed) Alloc rate941,022,012 bytes per MUT second Productivity 8.2% of total user, 7.5% of total elapsed Finally, here's the profiling report (-pT)... Wed Sep 29 15:21 2004 Time and Allocation Profiling Report (Final) wc_fold +RTS -M800m -k250m -pT -RTS total time =1.22 secs (61 ticks @ 20 ms) total alloc = 173,502,056 bytes (excludes profiling overheads) COST CENTREMODULE %time %alloc wc Main 60.7 64.8 main Main 39.3 35.2 individual inherited COST CE MODULE no.entries %time %alloc %time %alloc MAINMAIN 1 0 0.00.0 100.0 100.0 main Main 146 1 39.3 35.2 100.0 100.0 wcMain 147 3048000 60.7 64.8 60.7 64.8 CAFGHC.Handle 103 3 0.00.0 0.00.0 CAFSystem.Posix.Internals81 4 0.00.0 0.00.0 Any hints appreciated, Greg Buchholz ___ Haskell-Cafe mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] OCaml list sees abysmal Language Shootout results
Graham Klyne <[EMAIL PROTECTED]> writes: > > main = do file <- getContents > > putStrLn $ show (length $ lines file) ++ " " ++ > > show (length $ words file) ++ " " ++ > > show (length file) > > > >Space-leak or what? > > I can see that this requires the original file to be kept for 3-time > scanning, so enough memory for the entire file will be required. Is that > *the* problem to which you allude? I can't see any other problem > here. Yes, it is the main problem. Don't forget, the shootout benchmark runs this example over a very large input (15Mb). Since the character-list stored in memory for this file takes approximately 12 bytes per character, that blows up to about 180Mb to store temporarily. The shootout performance figures reckon that ghc actually uses 223Mb in total. > And why would this put Haskell at a disadvantage? Large live heap space means a large time spent in GC, trying to find the needle that is actual garbage in the haystack of live pointers. It also increases the likelihood of cache misses and all kinds of other bad memory effects. In other words, wasting space is wasting time. There is a good literature on heap profiling in Haskell which demonstrates the benefits of keeping space usage small to improve time performance. In any case, for the shootout, this is patently a different algorithm to the one every other solution uses. All the other languages do a simple one-pass traversal of the file, in constant memory space. Why artificially handicap Haskell by insisting it do the job naively? Regards, Malcolm ___ Haskell-Cafe mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] OCaml list sees abysmal Language Shootout results
On Wed, Sep 29, 2004 at 01:41:03PM +0100, Graham Klyne wrote: > >With code like this, I'm not surprised! > > > >main = do file <- getContents > > putStrLn $ show (length $ lines file) ++ " " ++ > > show (length $ words file) ++ " " ++ > > show (length file) > > > >Space-leak or what? > > Er, please excuse a dumb question, but I'm struggling to see the problem > here. > > I can see that this requires the original file to be kept for 3-time > scanning, so enough memory for the entire file will be required. It would be nice if these scans were performed concurrently in a way that would make memory usage constant, wouldn't it? ;) Hmmm... maybe some simple tracking of garbage collection results would suffice... Reschedule if the current thread doesn't help in collecting garbage... But I am dreaming now... :) > Is that *the* problem to which you allude? I can't see any other > problem here. And why would this put Haskell at a disadvantage? The only problem is that some people may draw incorrect conclusions. Should we care? I already submitted two improvements for shootout in the last two days (not included yet), but I don't know if it's worth the effort. I remember SPJ's motto: ,,Avoid success at all cost''. Is this motto still valid? http://research.microsoft.com/Users/simonpj/papers/haskell-retrospective/HaskellRetrospective-2.pdf Best regards, Tom -- .signature: Too many levels of symbolic links ___ Haskell-Cafe mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] OCaml list sees abysmal Language Shootout results
At 10:55 28/09/04 +0100, Malcolm Wallace wrote: Keith Wansbrough <[EMAIL PROTECTED]> writes: > I can't believe that a simple "wc" implementation should be > 570 times slower in Haskell than OCaml - could someone investigate and > fix the test? With code like this, I'm not surprised! main = do file <- getContents putStrLn $ show (length $ lines file) ++ " " ++ show (length $ words file) ++ " " ++ show (length file) Space-leak or what? Er, please excuse a dumb question, but I'm struggling to see the problem here. I can see that this requires the original file to be kept for 3-time scanning, so enough memory for the entire file will be required. Is that *the* problem to which you allude? I can't see any other problem here. And why would this put Haskell at a disadvantage? #g Graham Klyne For email: http://www.ninebynine.org/#Contact ___ Haskell-Cafe mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] OCaml list sees abysmal Language Shootout results
On Tue, 28 Sep 2004, Tomasz Zielonka wrote: > Changed readArray to unsafeRead, and it is 47 times faster now. > > I must say I am pleasantly surprised that GHC managed to unbox > everything there was to unbox without much annotations. For 5MB file the > program allocated only 192KB in the heap. Especially optimisation of > higher-level constructs like 'fmap (toEnun . fromEnum) ...' is very > nice. Now I like to see an implementation which is both elegant and fast ... :-) ___ Haskell-Cafe mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] OCaml list sees abysmal Language Shootout results
On Tue, Sep 28, 2004 at 12:49:52PM +0200, Tomasz Zielonka wrote: > On Tue, Sep 28, 2004 at 12:01:11PM +0200, Tomasz Zielonka wrote: > > On Tue, Sep 28, 2004 at 10:46:14AM +0100, Keith Wansbrough wrote: > > > I just saw this on the OCaml list (in a posting by "Rafael 'Dido' > > > Sevilla" <[EMAIL PROTECTED]> in the "Observations on OCaml vs. Haskell" > > > thread). I can't believe that a simple "wc" implementation should be > > > 570 times slower in Haskell than OCaml - could someone investigate and > > > fix the test? > > > > No wonder it is so slow, this program looks as a result of some ,,as > > slow as possible'' contest ;) > > It took me half an hour to make a version which is 41 times faster > on a 5MB file. It should be possible to make it even 2-3 times faster > than this. Changed readArray to unsafeRead, and it is 47 times faster now. I must say I am pleasantly surprised that GHC managed to unbox everything there was to unbox without much annotations. For 5MB file the program allocated only 192KB in the heap. Especially optimisation of higher-level constructs like 'fmap (toEnun . fromEnum) ...' is very nice. Code attached. Feel free to improve it. Best regards, Tom -- .signature: Too many levels of symbolic links import System.IO import Data.Array.IO import Data.Array.Base (unsafeRead) import Data.Word import Char import List wc :: Handle -> IO (Int, Int, Int) wc h = do buf <- newArray_ (0, bufSize - 1) :: IO (IOUArray Int Word8) let wcLoop :: Char -> Int -> Int -> Int -> Int -> Int -> IO (Int, Int, Int) wcLoop prev nl nw nc i n | prev `seq` nl `seq` nw `seq` nc `seq` i `seq` n `seq` False = undefined | i == n = do n' <- hGetArray h buf bufSize if n' == 0 then return (nl, nw, nc) else wcLoop prev nl nw nc 0 n' | otherwise = do c <- fmap (toEnum . fromEnum) (unsafeRead buf i) wcLoop c (nl + if c == '\n' then 1 else 0) (nw + if not (isSpace c) && isSpace prev then 1 else 0) (nc + 1) (i + 1) n wcLoop ' ' 0 0 0 0 0 where bufSize :: Int bufSize = 8192 main = do (nl, nw, nc) <- wc stdin putStrLn $ concat $ intersperse " " $ map show [nl, nw, nc] ___ Haskell-Cafe mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] OCaml list sees abysmal Language Shootout results
On Tue, Sep 28, 2004 at 12:01:11PM +0200, Tomasz Zielonka wrote: > On Tue, Sep 28, 2004 at 10:46:14AM +0100, Keith Wansbrough wrote: > > I just saw this on the OCaml list (in a posting by "Rafael 'Dido' > > Sevilla" <[EMAIL PROTECTED]> in the "Observations on OCaml vs. Haskell" > > thread). I can't believe that a simple "wc" implementation should be > > 570 times slower in Haskell than OCaml - could someone investigate and > > fix the test? > > No wonder it is so slow, this program looks as a result of some ,,as > slow as possible'' contest ;) It took me half an hour to make a version which is 41 times faster on a 5MB file. It should be possible to make it even 2-3 times faster than this. Best regards, Tom -- .signature: Too many levels of symbolic links ___ Haskell-Cafe mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] OCaml list sees abysmal Language Shootout results
On Tue, Sep 28, 2004 at 10:46:14AM +0100, Keith Wansbrough wrote: > I just saw this on the OCaml list (in a posting by "Rafael 'Dido' > Sevilla" <[EMAIL PROTECTED]> in the "Observations on OCaml vs. Haskell" > thread). I can't believe that a simple "wc" implementation should be > 570 times slower in Haskell than OCaml - could someone investigate and > fix the test? No wonder it is so slow, this program looks as a result of some ,,as slow as possible'' contest ;) main = do file <- getContents putStrLn $ show (length $ lines file) ++ " " ++ show (length $ words file) ++ " " ++ show (length file) Best regards, Tom -- .signature: Too many levels of symbolic links ___ Haskell-Cafe mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] OCaml list sees abysmal Language Shootout results
Keith Wansbrough <[EMAIL PROTECTED]> writes: > I can't believe that a simple "wc" implementation should be > 570 times slower in Haskell than OCaml - could someone investigate and > fix the test? With code like this, I'm not surprised! main = do file <- getContents putStrLn $ show (length $ lines file) ++ " " ++ show (length $ words file) ++ " " ++ show (length file) Space-leak or what? Regards, Malcolm ___ Haskell-Cafe mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] OCaml list sees abysmal Language Shootout results
I just saw this on the OCaml list (in a posting by "Rafael 'Dido' Sevilla" <[EMAIL PROTECTED]> in the "Observations on OCaml vs. Haskell" thread). I can't believe that a simple "wc" implementation should be 570 times slower in Haskell than OCaml - could someone investigate and fix the test? --KW 8-) http://caml.inria.fr/archives/200409/msg00485.html > > 2. Haskell strings are lists of characters > > > > It's annoying that strings aren't normally processed this way in OCaml, > > and even more annoying that (^) or (::) cannot be used in pattern > > matching over strings. I like Haskell's approach. The list > > concatenation operator is the same as the string concatenation operator > > in Haskell. > > > > This is something of an efficiency/elegance tradeoff. Making strings > act like lists means potentially boxing *every character in the string*. > In other words, it's potentially a very expensive way of doing business. > Paul Graham was mulling over this kind of tradeoff in his design of Arc, > as I recall. Another language that does this type of thing is Erlang, > and both languages seem to be significantly slower than OCaml in string > handling, at least as far as this site goes: > > http://shootout.alioth.debian.org/ > > For the word count benchmark OCaml scores 0.1850 seconds, while GHC is a > dismal last place at 105.2110 seconds! Even the bytecode ocaml is an > order of magnitude faster. The word frequency benchmark also shows this > kind of poor string handling performance for Haskell, with OCaml scoring > 0.5669 seconds, while GHC scores a truly dismal 6.4540, more than an > order of magnitude slower, and even the bytecode ocaml is faster at > 4.2644 seconds. > > All in all, it would appear that Haskell's approach has been expensive > in terms of performance, if the benchmarks are to be taken at face > value. Such are the tradeoffs language designers have to make. ___ Haskell-Cafe mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell-cafe