Re: [Haskell-cafe] OCaml list sees abysmal Language Shootout results

2004-10-06 Thread Ketil Malde
Greg Buchholz [EMAIL PROTECTED] writes:

 I've been looking at the other shootout results (with the hope of
 learning something about making haskell programs faster/less memory
 hungry) and I couldn't quite figure out why the Hashes, part II test
 comsumes so much memory ( http://shootout.alioth.debian.org/bench/hash2/ ). 
 So I started to try heap profiling, and generated the following graphs
 for the different types of profiles...

 biography = http://sleepingsquirrel.org/haskell/hash2_b.ps
 retainer  = http://sleepingsquirrel.org/haskell/hash2_r.ps
 closure   = http://sleepingsquirrel.org/haskell/hash2_d.ps
 type  = http://sleepingsquirrel.org/haskell/hash2_y.ps
 cost cntr = http://sleepingsquirrel.org/haskell/hash2_c.ps

 ...but I have a hard time figuring out how to prevent something like
 stg_ap_3_upd_info or void cells from consuming so much memory.

One thing you could do, is to move the pure definitions (constants and
functions) out of the monad.  This will make them separate cost
centres, with their own profile information.  I toyed with this, but
admittedly, it didn't change much in this case.  I think it is better
style, though.

A simple way to improve speed marginally, is to specify Int instead of
letting things default to Integer.  A more complex way, saving about
60% of the time, is to use unboxed arrays instead of strings for the
keys - memory consumption seems to be the same, though. 

To get memory consumption down, I tried a strict update function:

   update k fm = let x = (get hash1 k + get fm k) 
 in x `seq` addToFM fm k x

which slowed the program down(!), but reduced memory consumption from
about 25Mb to 1.5Mb.  So it seems that the memory consumption is due
to unevaluated values in the FMs.

BTW, I looked at the shootout web pages, but I couldn't find the
specification for any of the benchmarks.  What is and isn't allowed? 

-kzm


import System (getArgs) 
import Data.FiniteMap 
import Data.Array.Unboxed
import Maybe

type Key = UArray Int Char
type Map = FiniteMap (UArray Int Char) Int

hash1, hash2 :: Map
hash1 = listToFM $ zip keys [0..] 
hash2 = listToFM $ zip keys (repeat 0) 

keys :: [Key]
keys = map (\x - listArray (1,4+length (show x)) (foo_ ++ show x)) [0..] 
get :: Map - Key - Int
get fm k = fromJust $ lookupFM fm k 

update :: Key - Map - Map
update k fm = let x = (get hash1 k + get fm k) in x `seq` addToFM fm k x

foo_1 = keys!!1
foo_ = keys!!

main = do 
 [n] - getArgs  
 let res = foldr update hash2 (concat $ replicate (read n) keys) 
 putStrLn $ unwords $ map show [get hash1 foo_1,
get hash1 foo_, 
get res foo_1, 
get res foo_] 

-- 
If I haven't seen further, it is by standing in the footprints of giants
___
Haskell-Cafe mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] OCaml list sees abysmal Language Shootout results

2004-10-06 Thread Ketil Malde
Ketil Malde [EMAIL PROTECTED] writes:

 To get memory consumption down, I tried a strict update function:

update k fm = let x = (get hash1 k + get fm k) 
  in x `seq` addToFM fm k x

 which slowed the program down(!), 

I wonder if this isn't due to never evaluating the values for
foo_2 to foo_9998 because of laziness?

 BTW, I looked at the shootout web pages, but I couldn't find the
 specification for any of the benchmarks.  What is and isn't allowed? 

For instance, changing the order of of the updates shaves another
10-20% off the time (because of cache-friendliness, I suppose):

  - let res = foldr update hash2 (concat $ replicate (read n) keys)
  + let res = foldr update hash2 (concat $ map (replicate (read n)) keys)

-kzm
-- 
If I haven't seen further, it is by standing in the footprints of giants
___
Haskell-Cafe mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] OCaml list sees abysmal Language Shootout results

2004-10-05 Thread Greg Buchholz
Keith Wansbrough wrote:
 I just saw this on the OCaml list (in a posting by Rafael 'Dido' 
 Sevilla [EMAIL PROTECTED] in the Observations on OCaml vs. Haskell 
 thread).  I can't believe that a simple wc implementation should be 
 570 times slower in Haskell than OCaml - could someone investigate and 
 fix the test?

I've been looking at the other shootout results (with the hope of
learning something about making haskell programs faster/less memory
hungry) and I couldn't quite figure out why the Hashes, part II test
comsumes so much memory ( http://shootout.alioth.debian.org/bench/hash2/ ). 
So I started to try heap profiling, and generated the following graphs
for the different types of profiles...

biography = http://sleepingsquirrel.org/haskell/hash2_b.ps
retainer  = http://sleepingsquirrel.org/haskell/hash2_r.ps
closure   = http://sleepingsquirrel.org/haskell/hash2_d.ps
type  = http://sleepingsquirrel.org/haskell/hash2_y.ps
cost cntr = http://sleepingsquirrel.org/haskell/hash2_c.ps

...but I have a hard time figuring out how to prevent something like
stg_ap_3_upd_info or void cells from consuming so much memory.
Anyone have pointers on how to best use the profile information?  I'm
still trying to digest Heap Profiling for Space Efficiency
http://portal.acm.org/citation.cfm?id=734156 
Are there any other related papers out there?  (Of course it might be
the case that I need a FiniteMap tutorial)

Here's the code in question...

import System (getArgs)
import Data.FiniteMap

main = do
 [n] - getArgs 
 let get fm k = lookupWithDefaultFM fm 0 k
 let keys = map (\x - foo_ ++ show x) [0..]
 let hash1 = listToFM $ zip keys [0..]
 let hash2 = listToFM $ zip keys (repeat 0)
 let update k fm = addToFM_C (+) fm k (get hash1 k)
 let res = foldr update hash2 (concat $ replicate (read n) keys)
 putStrLn $ unwords $ map show [get hash1 foo_1,
get hash1 foo_,
get res foo_1,
get res foo_]



Thanks,

Greg Buchholz
___
Haskell-Cafe mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] OCaml list sees abysmal Language Shootout results

2004-09-30 Thread Malcolm Wallace
Just out of interest, I ran all of these suggested variations of
the word count solution in Haskell head-to-head against each other.
Here are the results, in seconds, on my machine (2.4GHz x86/Linux)
for the suggested input (N=500) from the shootout site.  All Haskell
versions were compiled with ghc-5.04.2 -O2.

original space-leaky2.257
Greg Buchholz   1.619   *
Sam Mason   0.594
Malcolm Wallace 0.457
Georg Martius   0.322   *
Tomasz Zielonka 0.047
linux 'wc'  0.085

Those marked with a * gave the wrong number of words.  The really
interesting thing is that Tomasz's solution is twice as fast as the
standard Gnu implementation!

Regards,
Malcolm
___
Haskell-Cafe mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] OCaml list sees abysmal Language Shootout results

2004-09-30 Thread Tomasz Zielonka
On Thu, Sep 30, 2004 at 11:26:15AM +0100, Malcolm Wallace wrote:
 Those marked with a * gave the wrong number of words.  The really
 interesting thing is that Tomasz's solution is twice as fast as the
 standard Gnu implementation!

That's probably because Gnu wc is locale aware.

Best regards,
Tom

-- 
.signature: Too many levels of symbolic links
___
Haskell-Cafe mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] OCaml list sees abysmal Language Shootout results

2004-09-30 Thread Kevin Everets
On Thu, Sep 30, 2004 at 11:26:15AM +0100, Malcolm Wallace wrote:

 Just out of interest, I ran all of these suggested variations of
 the word count solution in Haskell head-to-head against each other.
 Here are the results, in seconds, on my machine (2.4GHz x86/Linux)
 for the suggested input (N=500) from the shootout site.  All Haskell
 versions were compiled with ghc-5.04.2 -O2.
 
   original space-leaky2.257
   Greg Buchholz   1.619   *
   Sam Mason   0.594
   Malcolm Wallace 0.457
   Georg Martius   0.322   *
   Tomasz Zielonka 0.047
   linux 'wc'  0.085
 
 Those marked with a * gave the wrong number of words.  The really
 interesting thing is that Tomasz's solution is twice as fast as the
 standard Gnu implementation!

I took Georg's, fixed the word count logic and added prettier
printing, and then combined it with Sam's main (which I find more
elegant, but others may find less straightforward).  I think it
strikes a good balance between efficiency and elegance.

Cheers,

Kevin.

--

import IO

main = getContents = putStrLn . showC . foldl wc' (C 0 0 0 False)

data C = C !Int !Int !Int !Bool deriving Show
-- Line Word Char InWord

showC (C l w c _) = show l ++   ++ show w ++   ++ show c

wc' :: C  - Char - C
wc' (C l w c _) '\n' = C (l+1) w (c+1) False
wc' (C l w c _) ' '  = C l w (c+1) False
wc' (C l w c _) '\t' = C l w (c+1) False
wc' (C l w c False) _= C l (w+1) (c+1) True
wc' (C l w c True)  _= C l w (c+1) True
___
Haskell-Cafe mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] OCaml list sees abysmal Language Shootout results

2004-09-30 Thread Tomasz Zielonka
On Thu, Sep 30, 2004 at 09:49:46AM -0400, Kevin Everets wrote:
 I took Georg's, fixed the word count logic and added prettier
 printing, and then combined it with Sam's main (which I find more
 elegant, but others may find less straightforward).  I think it
 strikes a good balance between efficiency and elegance.

Then how about a solution like this: I took your program but used
my fast fileIterate instead of ,,foldl over getContents''. 
I also added {-# OPTIONS -funbox-strict-fields #-}, and played a bit
to get the best optimisations from GHC.

It's about 7 times faster this way, but it's still two times slower than
the solution I sent to shootout.

Devilish plan: Maybe we could have some variants of fileIterate in GHC's
libraries? ;-

I remember that someone proposed similar functions on haskell's lists
some time ago, but can't remember who.

Best regards,
Tom

-- 
.signature: Too many levels of symbolic links

{-# OPTIONS -funbox-strict-fields #-}

import System.IO
import Data.Array.IO
import Data.Array.Base
import Data.Word
import Data.Int
import List
import Char

main = fileIterate stdin wc' (C 0 0 0 False) = putStrLn . showC

data C = C !Int !Int !Int !Bool deriving Show
-- Line Word Char InWord

showC (C l w c _) = show l ++   ++ show w ++   ++ show c

wc' :: C  - Char - C
wc' (C l w c _) '\n' = C (l+1) w (c+1) False
wc' (C l w c _) ' '  = C l w (c+1) False
wc' (C l w c _) '\t' = C l w (c+1) False
wc' (C l w c False) _= C l (w+1) (c+1) True
wc' (C l w c True)  _= C l w (c+1) True



{-# INLINE fileIterate #-}

fileIterate :: Handle - (a - Char - a) - a - IO a
fileIterate h f a0 = do
buf - newArray_ (0, bufSize - 1) :: IO (IOUArray Int Word8)
let loop i n a
| i `seq` n `seq` a `seq` False = undefined
| i == n =
do  n' - hGetArray h buf bufSize
if n' == 0
then return a
else loop 0 n' a
| otherwise =
do  c - fmap (toEnum . fromEnum) (readArray buf i)
loop (i + 1) n (f a c)
loop 0 0 a0
  where
bufSize :: Int
bufSize = 4096

___
Haskell-Cafe mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] OCaml list sees abysmal Language Shootout results]

2004-09-30 Thread Greg Buchholz
Sam Mason wrote:
 
 You probably want some strictness annotations in there. . .

Now we're getting somewhere.  When I replace the tuples with my own
(strict) data structure, it gets about 7.5 times faster than the original
shootout example (or about 24 times faster than the version with
tuples).  I get another 2x speedup when I pass '+RTS -G1' to the
executable.  So the version below is about 15 times faster than the
original using the 3MB data file from the shootout. (Now we're only
about 40x slower than ocaml).  Don't forget to turn on '-funbox-strict-fields' 
for an additional improvement.

-- Compile with...
-- ghc -O2 -ddump-simpl -fvia-c -funbox-strict-fields -o wc_fold2 wc_fold2.hs
-- execute as...  ./wc_fold2 +RTS -G1 input.txt
import IO

main = do   file - getContents
putStrLn $ show (foldl wc (St 0 0 0) file)

data Stuple = St !Int !Int !Int  deriving Show

wc (St l w c) '\n' = St (l+1) w(c+1)
wc (St l w c) ' '  = St  l   (w+1) (c+1)
wc (St l w c)  x   = St  lw(c+1)

...It still seems like it's using a lot of memory (or at least doing a
lot of garbage collecting).  But it it still vastly better than before.
Is there any way to reduce this more?  (60,000,000 bytes divided by
3,000,000 chars = 20 bytes per char).  Here's the memory usage report...


 61,387,752 bytes allocated in the heap
 11,837,148 bytes copied during GC
 99,780 bytes maximum residency (234 sample(s))

234 collections in generation 0 (  0.11s)

  1 Mb total memory in use

  INIT  time0.00s  (  0.00s elapsed)
  MUT   time0.15s  (  0.15s elapsed)
  GCtime0.11s  (  0.14s elapsed)
  EXIT  time0.00s  (  0.00s elapsed)
  Total time0.26s  (  0.29s elapsed)

  %GC time  42.3%  (48.3% elapsed)

  Alloc rate409,251,680 bytes per MUT second

  Productivity  57.7% of total user, 51.7% of total elapsed


Greg Buchholz
___
Haskell-Cafe mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] OCaml list sees abysmal Language Shootout results

2004-09-30 Thread Graham Klyne
At 16:56 30/09/04 +0200, Tomasz Zielonka wrote:
Then how about a solution like this: I took your program but used
my fast fileIterate instead of ,,foldl over getContents''.
I also added {-# OPTIONS -funbox-strict-fields #-}, and played a bit
to get the best optimisations from GHC.
It's about 7 times faster this way, but it's still two times slower than
the solution I sent to shootout.
Devilish plan: Maybe we could have some variants of fileIterate in GHC's
libraries? ;-
Two responses:
1. I agree that providing the right kind of library functions (and material 
explaining how to use them) maybe a key to getting efficient code without 
losing high-level forms of expression.

2. Your fileIterator certainly looks nicer (to me) than your other 
solution, but...

Tagging along with this debate, I found myself wondering if, in order to 
get performance comparable to other languages, it is really necessary to 
write code like code in other languages.  E.g., I thought one of the 
lessons of John Hughes' Why functional Programming matters was that one 
can achieve greater efficiencies by climbing higher rather than dropping 
down to the level of other languages.  Your fileIterate looks to me like a 
step in the right direction.

But I did wonder if it wouldn't be possible to also abstract out the I/O 
element of your 'fileIterate', using instead something like:
  streamIterate :: [b] - (a - b - a) - a - IO a

(I was originally thinking of something like:
  streamIterate :: (c - (b,c)) - c - (a - b - a) - a - IO a
where the first argument is a function that takes a sequence generator and 
returns the next member of the sequence+new generator, and the 2nd arg is 
the initial generator.)

For such an approach to be useful, I think it would also be important to 
have variations of functions like length, lines, words that can be combined 
to make a function like your wc'.

#g

Graham Klyne
For email:
http://www.ninebynine.org/#Contact
___
Haskell-Cafe mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] OCaml list sees abysmal Language Shootout results

2004-09-30 Thread Tomasz Zielonka
On Thu, Sep 30, 2004 at 05:40:58PM +0100, Graham Klyne wrote:
 2. Your fileIterator certainly looks nicer (to me) than your other 
 solution, but...

It looks nicer to me too.

 Tagging along with this debate, I found myself wondering if, in order to 
 get performance comparable to other languages, it is really necessary to 
 write code like code in other languages.

Maybe it's not necessary to get the comparable performance, but it often
is if you want to get the best performance possible. Of course I
wouldn't mind if GHC optimised high level code so well, that we wouldn't
have to do it, but I guess it's just not that easy. 

What I like about GHC is that I can start from simple, high-level,
sometimes slow solutions, but if there are efficiency problems, there is
a big chance that I can solve them without switching the language.

 But I did wonder if it wouldn't be possible to also abstract out the I/O 
 element of your 'fileIterate', using instead something like:
   streamIterate :: [b] - (a - b - a) - a - IO a

It seems to be a variant of foldl. You can eliminate IO from return
type, or is there some reason for it?

Best regards,
Tom

-- 
.signature: Too many levels of symbolic links
___
Haskell-Cafe mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] OCaml list sees abysmal Language Shootout results

2004-09-30 Thread Graham Klyne
At 19:39 30/09/04 +0200, Tomasz Zielonka wrote:
What I like about GHC is that I can start from simple, high-level,
sometimes slow solutions, but if there are efficiency problems, there is
a big chance that I can solve them without switching the language.
That's a very good point, I think.  One to hang on to.
 But I did wonder if it wouldn't be possible to also abstract out the I/O
 element of your 'fileIterate', using instead something like:
   streamIterate :: [b] - (a - b - a) - a - IO a
It seems to be a variant of foldl. You can eliminate IO from return
type, or is there some reason for it?
Doh!  (Why didn't I spot that?)  All roads lead to Rome, or something like 
that?  There seems to be a recurring tension between how much to specialize 
and how much to generalize.

Maybe it should be something like:
  streamIterate :: (Monad m) = [b] - (a - b - m a) - a - m a
?
Er, but that's similarly a variation of foldM, right?
Or maybe my earlier idea was closer:
  streamIterate :: (Monad m) =
  (c - m (b,c)) - c - (a - b - m a) - a - m a
?
Hmmm... I feel like a (intellectual) bull-in-a-china-shop here.  I'm 
blundering about on the trail of a delicate and elegant idea that I'm sure 
others could dissect far more clearly.

What I'm trying to capture (I think) is that there's some baggage to do 
with accessing the raw data and emitting the desired result that needs to 
be carried along (or interleaved) with the required computation on that 
data.  Does this make any sense, or am I descending into farce here?

#g

Graham Klyne
For email:
http://www.ninebynine.org/#Contact
___
Haskell-Cafe mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] OCaml list sees abysmal Language Shootout results

2004-09-30 Thread Georg Martius
Hi folks,
On Thu, 30 Sep 2004 01:02:54 +0100, Sam Mason [EMAIL PROTECTED] wrote:
Greg Buchholz wrote:
The algorithm isn't correct (it counts spaces instead of words), but
anyone have advice for improving its performance?
You probably want some strictness annotations in there. . .
snip
Last night as I have tried to improve Gregs wc in a simple fashion and came up with 
the same idea to make a new data type with strict fields. I thought why one couldn't 
add some kind of strictness annotation to the function type. First attempt:
wc :: !(Int,Int,Int) - Char - (Int, Int, Int)
As far as I know the compiler does strictness analysis to find strict arguments in a 
function anyway. Would it make sense to allow this kind of explicit strictness? Where 
are the problems with that?
I mean lazyness is really useful and it is our best friend in this kind of 
application, since we can make stream processing without implementing a buffer and so 
on. On the other hand one gets occasionally traped by it and it is not allways easy to 
grasp why.
Some more general comment: The code for the shootout doesn't need to be extremly fast 
in my eyes, it needs to be elegant and reasonable at performance and memory 
consuptions (In this order). I don't want to say that Thomaszs solution is bad, but it 
is not a typical Haskell style application. If someone (not haskeller) looks at the 
implementation it should be very obvious and clear.
The last few weeks the list have been full of performance questions (including my own 
ones) and it is really a pitty that it is such an issue. I believe that most problems 
occuring with performance and memory consumptions could be easily solved by partial 
and explicit strictness. Please enlight me if I am wrong.
Regards,
 Georg
___
Haskell-Cafe mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] OCaml list sees abysmal Language Shootout results

2004-09-30 Thread Greg Buchholz
Malcolm Wallace wrote:
 Here are the results, in seconds, on my machine (2.4GHz x86/Linux)
 for the suggested input (N=500) from the shootout site.  All Haskell
 versions were compiled with ghc-5.04.2 -O2.

I thought I'd take a stab at timing a few of the examples with
different compiler versions to see what difference that would make
(ghc-6.2.1 vs. ghc-6.3.20040928).  I compared Kevin Everets' version with
the two Tomasz Zielonka versions.  I ran the test with N=2500 (i.e. 2500
copies of the original file, which is what is apparently used in the
shootout) on my AthlonXP 1600 under x86/Linux.

6.2.1   6.3.20040928
--- ---
Kevin   3.615s  3.156s  
Kevin  (+RTS -G1)   1.666s  1.405s
Tomasz (pretty) 0.725s  0.481s
Tomasz (fast)   0.403s  0.430s

Interesting to see the speed increase going from 6.2.1 to 6.3 for
Tomasz' pretty example.  Anyone have an explaination for the 2x speed
increase for running Kevin's version with '+RTS -G1'?

(And for reference, here's the results on my machine for the perl and
gcc versions of the test and gnu/wc)

perl-5.8.4  0.535s
gcc-3.4.2   0.102s
gnu/wc  0.435s

Greg Buchholz

___
Haskell-Cafe mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] OCaml list sees abysmal Language Shootout results

2004-09-30 Thread Andrew Cheadle
Hi Greg

Anyone have an explaination for the 2x speed
increase for running Kevin's version with '+RTS -G1'?

+RTS -Sstderr -RTS and +RTS -sstderr -RTS will probably indicate why.
I'd be surprised if the amount of data copied for the semi-space
collector isn't much less than for the generational.

Chances are that data is dying off very quickly and very little is
being copied for the semi-space collector - the allocation area
has a variable sizing policy and this size of allocation area is
sufficient for the majority of the data to die off before it is filled
and a GC kicks in - hence very little is copied. However, for the
generational collector, the nursery is of fixed size. Here, the lifetime
of the data is probably longer than the time it takes for the nursery to
be filled and a minor GC kicks in and the data promoted to generation 1
(hence copied) where it then probably dies off but can't be collected
until a major collection kicks in.

Probably, or something like that ;-)

Cheers

Andy

*
*  Andrew Cheadleemail:  [EMAIL PROTECTED] *
*  Department of Computing   http://www.doc.ic.ac.uk/~amc4/ *
*  Imperial College *
*  University of London *
*
___
Haskell-Cafe mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] OCaml list sees abysmal Language Shootout results

2004-09-30 Thread Greg Buchholz
Andrew Cheadle wrote:
 
 +RTS -Sstderr -RTS and +RTS -sstderr -RTS will probably indicate why.
 I'd be surprised if the amount of data copied for the semi-space
 collector isn't much less than for the generational.

Ahh. Data copied with '-G1' = 58MB vs. 203MB without.  For posterities 
sake, here are the numbers...

With '-G1'
---
306,616,872 bytes allocated in the heap
 58,844,344 bytes copied during GC
 99,316 bytes maximum residency (1169 sample(s))

   1169 collections in generation 0 (  0.62s)

  1 Mb total memory in use

  INIT  time0.00s  (  0.00s elapsed)
  MUT   time0.68s  (  0.71s elapsed)
  GCtime0.62s  (  0.68s elapsed)
  EXIT  time0.00s  (  0.00s elapsed)
  Total time1.30s  (  1.39s elapsed)

  %GC time  47.7%  (48.9% elapsed)

  Alloc rate450,907,164 bytes per MUT second

  Productivity  52.3% of total user, 48.9% of total elapsed


Without
---
306,616,872 bytes allocated in the heap
203,339,812 bytes copied during GC
109,088 bytes maximum residency (131 sample(s))

   1169 collections in generation 0 (  2.22s)
131 collections in generation 1 (  0.05s)

  2 Mb total memory in use

  INIT  time0.00s  (  0.00s elapsed)
  MUT   time0.79s  (  0.92s elapsed)
  GCtime2.27s  (  2.23s elapsed)
  EXIT  time0.00s  (  0.00s elapsed)
  Total time3.06s  (  3.15s elapsed)

  %GC time  74.2%  (70.8% elapsed)

  Alloc rate388,122,622 bytes per MUT second

  Productivity  25.8% of total user, 25.1% of total elapsed


Greg Buchholz
___
Haskell-Cafe mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] OCaml list sees abysmal Language Shootout results

2004-09-30 Thread Greg Buchholz
Georg Martius wrote:
 Some more general comment: The code for the shootout doesn't need to be 
 extremly fast in my eyes, it needs to be elegant and reasonable at 
 performance and memory consuptions (In this order). I don't want to say 
 that Thomaszs solution is bad, but it is not a typical Haskell style 
 application. If someone (not haskeller) looks at the implementation it 
 should be very obvious and clear.

It might also be nice if the code would run under the other haskell
compliers like Hugs and nhc98 right out-of-the-box.


Greg Buchholz 
___
Haskell-Cafe mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] OCaml list sees abysmal Language Shootout results

2004-09-29 Thread Graham Klyne
At 10:55 28/09/04 +0100, Malcolm Wallace wrote:
Keith Wansbrough [EMAIL PROTECTED] writes:
   I can't believe that a simple wc implementation should be
 570 times slower in Haskell than OCaml - could someone investigate and
 fix the test?
With code like this, I'm not surprised!
main = do file - getContents
  putStrLn $ show (length $ lines file) ++   ++
 show (length $ words file) ++   ++
 show (length file)
Space-leak or what?
Er, please excuse a dumb question, but I'm struggling to see the problem here.
I can see that this requires the original file to be kept for 3-time 
scanning,  so enough memory for the entire file will be required.  Is that 
*the* problem to which you allude?  I can't see any other problem 
here.  And why would this put Haskell at a disadvantage?

#g

Graham Klyne
For email:
http://www.ninebynine.org/#Contact
___
Haskell-Cafe mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] OCaml list sees abysmal Language Shootout results

2004-09-29 Thread Tomasz Zielonka
On Wed, Sep 29, 2004 at 01:41:03PM +0100, Graham Klyne wrote:
 With code like this, I'm not surprised!
 
 main = do file - getContents
   putStrLn $ show (length $ lines file) ++   ++
  show (length $ words file) ++   ++
  show (length file)
 
 Space-leak or what?
 
 Er, please excuse a dumb question, but I'm struggling to see the problem 
 here.
 
 I can see that this requires the original file to be kept for 3-time 
 scanning,  so enough memory for the entire file will be required.

It would be nice if these scans were performed concurrently in a way
that would make memory usage constant, wouldn't it? ;)

Hmmm... maybe some simple tracking of garbage collection results would
suffice... Reschedule if the current thread doesn't help in collecting
garbage... But I am dreaming now... :)

 Is that *the* problem to which you allude?  I can't see any other
 problem here.  And why would this put Haskell at a disadvantage?

The only problem is that some people may draw incorrect conclusions.
Should we care? I already submitted two improvements for shootout in
the last two days (not included yet), but I don't know if it's worth
the effort.

I remember SPJ's motto: ,,Avoid success at all cost''. Is this motto
still valid?


http://research.microsoft.com/Users/simonpj/papers/haskell-retrospective/HaskellRetrospective-2.pdf

Best regards,
Tom

-- 
.signature: Too many levels of symbolic links
___
Haskell-Cafe mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] OCaml list sees abysmal Language Shootout results

2004-09-29 Thread Malcolm Wallace
Graham Klyne [EMAIL PROTECTED] writes:

  main = do file - getContents
putStrLn $ show (length $ lines file) ++   ++
   show (length $ words file) ++   ++
   show (length file)
 
 Space-leak or what?
 
 I can see that this requires the original file to be kept for 3-time 
 scanning,  so enough memory for the entire file will be required.  Is that 
 *the* problem to which you allude?  I can't see any other problem 
 here.

Yes, it is the main problem.  Don't forget, the shootout benchmark
runs this example over a very large input (15Mb).  Since the
character-list stored in memory for this file takes approximately 12
bytes per character, that blows up to about 180Mb to store temporarily.
The shootout performance figures reckon that ghc actually uses 223Mb
in total.

  And why would this put Haskell at a disadvantage?

Large live heap space means a large time spent in GC, trying to find
the needle that is actual garbage in the haystack of live pointers.
It also increases the likelihood of cache misses and all kinds of
other bad memory effects.  In other words, wasting space is wasting
time.  There is a good literature on heap profiling in Haskell which
demonstrates the benefits of keeping space usage small to improve
time performance.

In any case, for the shootout, this is patently a different algorithm
to the one every other solution uses.  All the other languages do a
simple one-pass traversal of the file, in constant memory space.  Why
artificially handicap Haskell by insisting it do the job naively?

Regards,
Malcolm
___
Haskell-Cafe mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] OCaml list sees abysmal Language Shootout results

2004-09-29 Thread Sam Mason
Greg Buchholz wrote:
The algorithm isn't correct (it counts spaces instead of words), but
anyone have advice for improving its performance?

You probably want some strictness annotations in there. . .  When I
tried the same thing, I came up with something like:

 import Char;

 cclass c | isSpace c = (c == '\n', False)
  | otherwise = (False, True)

 data Cdata = Cdata !Bool !Int !Int !Int
   deriving Show

 combine (Cdata last l w c) (nl, iw) = Cdata iw l' w' (c + 1)
 where l' = if nl then l + 1 else l
   w' = if not last  iw then w + 1 else w

 wc = foldl combine (Cdata False 0 0 0) . map cclass

 main = getContents = putStrLn . show . wc

It seems to work in constant stack space, and gives the same answers
(albeit not very beautifully) as my GNU copy of wc.

Is the problem
caused by the laziness of the Int's inside the tuple?

I'm pretty sure that's what's causing it.  I had quite a search around
when my version was running out of memory and everything seemed to
suggest that, basically, the algorithm is building a massive list of
+1s that only actually get executed when the you try and print the
totals at the end.

Any comments from more authoritative sources?

Here is the
report from ghc with the '-ddump-simpl' option.

If anyone has any hints about how to read this output, I'd be
grateful.  It makes a bit of sense, but I don't really know what it
means.  I.e. it's obviously the simplified parse tree and I can see
how it relates back to the source (loosely), but attempting to figure
out if something's going to be as leaky as a sieve or not is beyond
me.
___
Haskell-Cafe mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] OCaml list sees abysmal Language Shootout results

2004-09-28 Thread Keith Wansbrough
I just saw this on the OCaml list (in a posting by Rafael 'Dido' 
Sevilla [EMAIL PROTECTED] in the Observations on OCaml vs. Haskell 
thread).  I can't believe that a simple wc implementation should be 
570 times slower in Haskell than OCaml - could someone investigate and 
fix the test?

--KW 8-)

http://caml.inria.fr/archives/200409/msg00485.html

  2. Haskell strings are lists of characters
  
  It's annoying that strings aren't normally processed this way in OCaml, 
  and even more annoying that (^) or (::) cannot be used in pattern 
  matching over strings.  I like Haskell's approach.  The list 
  concatenation operator is the same as the string concatenation operator 
  in Haskell.
  
 
 This is something of an efficiency/elegance tradeoff.  Making strings
 act like lists means potentially boxing *every character in the string*.
 In other words, it's potentially a very expensive way of doing business.
 Paul Graham was mulling over this kind of tradeoff in his design of Arc,
 as I recall.  Another language that does this type of thing is Erlang,
 and both languages seem to be significantly slower than OCaml in string
 handling, at least as far as this site goes:
 
 http://shootout.alioth.debian.org/
 
 For the word count benchmark OCaml scores 0.1850 seconds, while GHC is a
 dismal last place at 105.2110 seconds!  Even the bytecode ocaml is an
 order of magnitude faster.  The word frequency benchmark also shows this
 kind of poor string handling performance for Haskell, with OCaml scoring
 0.5669 seconds, while GHC scores a truly dismal 6.4540, more than an
 order of magnitude slower, and even the bytecode ocaml is faster at
 4.2644 seconds.
 
 All in all, it would appear that Haskell's approach has been expensive
 in terms of performance, if the benchmarks are to be taken at face
 value.  Such are the tradeoffs language designers have to make.


___
Haskell-Cafe mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] OCaml list sees abysmal Language Shootout results

2004-09-28 Thread Malcolm Wallace
Keith Wansbrough [EMAIL PROTECTED] writes:

   I can't believe that a simple wc implementation should be 
 570 times slower in Haskell than OCaml - could someone investigate and 
 fix the test?

With code like this, I'm not surprised!

main = do file - getContents
  putStrLn $ show (length $ lines file) ++   ++
 show (length $ words file) ++   ++
 show (length file)

Space-leak or what?
Regards,
Malcolm
___
Haskell-Cafe mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] OCaml list sees abysmal Language Shootout results

2004-09-28 Thread Tomasz Zielonka
On Tue, Sep 28, 2004 at 10:46:14AM +0100, Keith Wansbrough wrote:
 I just saw this on the OCaml list (in a posting by Rafael 'Dido' 
 Sevilla [EMAIL PROTECTED] in the Observations on OCaml vs. Haskell 
 thread).  I can't believe that a simple wc implementation should be 
 570 times slower in Haskell than OCaml - could someone investigate and 
 fix the test?

No wonder it is so slow, this program looks as a result of some ,,as
slow as possible'' contest ;)

 main = do file - getContents
   putStrLn $ show (length $ lines file) ++   ++
  show (length $ words file) ++   ++
  show (length file)

Best regards,
Tom

-- 
.signature: Too many levels of symbolic links
___
Haskell-Cafe mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] OCaml list sees abysmal Language Shootout results

2004-09-28 Thread Tomasz Zielonka
On Tue, Sep 28, 2004 at 12:01:11PM +0200, Tomasz Zielonka wrote:
 On Tue, Sep 28, 2004 at 10:46:14AM +0100, Keith Wansbrough wrote:
  I just saw this on the OCaml list (in a posting by Rafael 'Dido' 
  Sevilla [EMAIL PROTECTED] in the Observations on OCaml vs. Haskell 
  thread).  I can't believe that a simple wc implementation should be 
  570 times slower in Haskell than OCaml - could someone investigate and 
  fix the test?
 
 No wonder it is so slow, this program looks as a result of some ,,as
 slow as possible'' contest ;)

It took me half an hour to make a version which is 41 times faster
on a 5MB file. It should be possible to make it even 2-3 times faster
than this.

Best regards,
Tom

-- 
.signature: Too many levels of symbolic links
___
Haskell-Cafe mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] OCaml list sees abysmal Language Shootout results

2004-09-28 Thread Tomasz Zielonka
On Tue, Sep 28, 2004 at 12:49:52PM +0200, Tomasz Zielonka wrote:
 On Tue, Sep 28, 2004 at 12:01:11PM +0200, Tomasz Zielonka wrote:
  On Tue, Sep 28, 2004 at 10:46:14AM +0100, Keith Wansbrough wrote:
   I just saw this on the OCaml list (in a posting by Rafael 'Dido' 
   Sevilla [EMAIL PROTECTED] in the Observations on OCaml vs. Haskell 
   thread).  I can't believe that a simple wc implementation should be 
   570 times slower in Haskell than OCaml - could someone investigate and 
   fix the test?
  
  No wonder it is so slow, this program looks as a result of some ,,as
  slow as possible'' contest ;)
 
 It took me half an hour to make a version which is 41 times faster
 on a 5MB file. It should be possible to make it even 2-3 times faster
 than this.

Changed readArray to unsafeRead, and it is 47 times faster now.

I must say I am pleasantly surprised that GHC managed to unbox
everything there was to unbox without much annotations. For 5MB file the
program allocated only 192KB in the heap. Especially optimisation of
higher-level constructs like 'fmap (toEnun . fromEnum) ...' is very
nice.

Code attached. Feel free to improve it.

Best regards,
Tom

-- 
.signature: Too many levels of symbolic links

import System.IO
import Data.Array.IO
import Data.Array.Base (unsafeRead)
import Data.Word
import Char
import List

wc :: Handle - IO (Int, Int, Int)
wc h = do
buf - newArray_ (0, bufSize - 1) :: IO (IOUArray Int Word8)
let
wcLoop :: Char - Int - Int - Int - Int - Int - IO (Int, Int, Int)
wcLoop prev nl nw nc i n 
| prev `seq` nl `seq` nw `seq` nc `seq` i `seq` n `seq` False =
undefined
| i == n =
do  n' - hGetArray h buf bufSize
if n' == 0
then return (nl, nw, nc)
else wcLoop prev nl nw nc 0 n'
| otherwise =
do  c - fmap (toEnum . fromEnum) (unsafeRead buf i)
wcLoop
c
(nl + if c == '\n' then 1 else 0)
(nw + if not (isSpace c)  isSpace prev then 1 else 0)
(nc + 1)
(i + 1)
n
wcLoop ' ' 0 0 0 0 0
  where
bufSize :: Int
bufSize = 8192

main = do
(nl, nw, nc) - wc stdin
putStrLn $ concat $ intersperse   $ map show [nl, nw, nc]

___
Haskell-Cafe mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] OCaml list sees abysmal Language Shootout results

2004-09-28 Thread Henning Thielemann

On Tue, 28 Sep 2004, Tomasz Zielonka wrote:

 Changed readArray to unsafeRead, and it is 47 times faster now.
 
 I must say I am pleasantly surprised that GHC managed to unbox
 everything there was to unbox without much annotations. For 5MB file the
 program allocated only 192KB in the heap. Especially optimisation of
 higher-level constructs like 'fmap (toEnun . fromEnum) ...' is very
 nice.

Now I like to see an implementation which is both elegant and fast ... 

:-)

___
Haskell-Cafe mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell-cafe