Re: [Haskell-cafe] A tale of three shootout entries

2007-12-03 Thread Don Stewart
s.clover:
Was this with tossing the partial sums code into the optimised bangs
program? Weird. I wonder if profiling will help explain why? In any case,
If nobody comes up with any other tweaks, I'll probably submit the
optimised bangs version to the shootout this weekend.
 

Please go ahead and submit. :) and remember to upload also to our wiki,
so we have a permanent record of the attempt,

http://haskell.org/haskellwiki/Shootout

Note down any ideas you have.

-- Don
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe



Re: [Haskell-cafe] A tale of three shootout entries

2007-12-03 Thread Andrew Coppin

Don Stewart wrote:

Please go ahead and submit. :) and remember to upload also to our wiki,
so we have a permanent record of the attempt,

http://haskell.org/haskellwiki/Shootout

Note down any ideas you have.
  


Now that GHC 6.6 is available, please you it?

Last time I looked at the shootout website, 6 of the GHC entries were 
marked simply as error. Do we know why, or am I missing something 
obvious? (I find the site to be a little unintuitive at times...)


Unfortunately I don't understand what half the benchmarks are supposed 
to be, which makes it rather hard to follow.


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] A tale of three shootout entries

2007-12-03 Thread Don Stewart
andrewcoppin:
 Don Stewart wrote:
 Please go ahead and submit. :) and remember to upload also to our wiki,
 so we have a permanent record of the attempt,
 
 http://haskell.org/haskellwiki/Shootout
 
 Note down any ideas you have.
   
 
 Now that GHC 6.6 is available, please you it?

Looks like something broke in an edit. Feel free to correct it.
  
 Last time I looked at the shootout website, 6 of the GHC entries were 
 marked simply as error. Do we know why, or am I missing something 
 obvious? (I find the site to be a little unintuitive at times...)

Sounds like you're looking at the wrong thing?


http://shootout.alioth.debian.org/gp4/benchmark.php?test=alllang=ghclang2=ghc

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] A tale of three shootout entries

2007-12-03 Thread Andrew Coppin

Don Stewart wrote:

andrewcoppin:
  

Now that GHC 6.6 is available, please you it?



Looks like something broke in an edit. Feel free to correct it.
  


Oh well. ;-)

But then, the GHC wiki still says The 6.8 branch is the current STABLE, 
and we are in the 6.8.1 release candidate phase. We aim to release 6.8.1 
around the beginning of October. I guess most people are busy writing 
the real stuff rather than updating documentation.


Last time I looked at the shootout website, 6 of the GHC entries were 
marked simply as error. Do we know why, or am I missing something 
obvious? (I find the site to be a little unintuitive at times...)



Sounds like you're looking at the wrong thing?


http://shootout.alioth.debian.org/gp4/benchmark.php?test=alllang=ghclang2=ghc
  


Mmm, interesting. I was looking at

http://shootout.alioth.debian.org/debian/benchmark.php?test=alllang=ghclang2=ghc

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] A tale of three shootout entries

2007-11-30 Thread Richard Kelsall

Sterling Clover wrote:
I'm still curious if the pre-calculation of partial sums that I did 
works well across processors, as I don't see why it shouldn't. My 
less-strictified version of Don's code is attached, and below are the 
functions you'll need to insert/replace to make the partial-sums 
optimization work.


Hello Sterling, I've timed your new Fasta with optimised bangs - it's
the fastest so far. But the pre-calculated partial-sums version seems
to go a bit slower for some unknown reason.

  Seconds
Optimised bangs program11.20compiled ghc --make
Optimised bangs program10.73compiled with -O -fglasgow-exts
   -optc-mfpmath=sse -optc-msse2
   -optc-march=pentium4
Partial-sums program   11.97compiled ghc --make
Partial-sums program   11.14compiled with -O -fglasgow-exts
   -optc-mfpmath=sse -optc-msse2
   -optc-march=pentium4

This is on my GHC 6.6.1, W2K, Intel Core 2 Duo 2.33GHz machine - same
as for the previous timings I gave in this thread.


Richard.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] A tale of three shootout entries

2007-11-30 Thread Sterling Clover
Was this with tossing the partial sums code into the optimised bangs
program? Weird. I wonder if profiling will help explain why? In any case, If
nobody comes up with any other tweaks, I'll probably submit the optimised
bangs version to the shootout this weekend.

--S

On Nov 30, 2007 1:30 PM, Richard Kelsall [EMAIL PROTECTED] wrote:

 Sterling Clover wrote:
  I'm still curious if the pre-calculation of partial sums that I did
  works well across processors, as I don't see why it shouldn't. My
  less-strictified version of Don's code is attached, and below are the
  functions you'll need to insert/replace to make the partial-sums
  optimization work.

 Hello Sterling, I've timed your new Fasta with optimised bangs - it's
 the fastest so far. But the pre-calculated partial-sums version seems
 to go a bit slower for some unknown reason.

   Seconds
 Optimised bangs program11.20compiled ghc --make
 Optimised bangs program10.73compiled with -O -fglasgow-exts
-optc-mfpmath=sse -optc-msse2
-optc-march=pentium4
 Partial-sums program   11.97compiled ghc --make
 Partial-sums program   11.14compiled with -O -fglasgow-exts
-optc-mfpmath=sse -optc-msse2
-optc-march=pentium4

 This is on my GHC 6.6.1, W2K, Intel Core 2 Duo 2.33GHz machine - same
 as for the previous timings I gave in this thread.


 Richard.


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


RE: [Haskell-cafe] A tale of three shootout entries

2007-11-28 Thread Simon Peyton-Jones
|  If, after investigation (and perhaps checking with Don) you find that 
adding bangs makes your program go
| slower, even though the function is in fact strict (otherwise it might go 
slower because it's just doing more
| work!) then I'd love to see a test case.
|
| I wonder if this could be related to what I observed with AVL trees and
| mentioned a while back (using a strict data type is slower than using
| explicit seqs to get the same strictness).

Could indeed be.  That message is still in my performance-tuning pile; it's not 
forgotten, just buried.

But the more evidence, the stronger the incentive to investigate.

Simon
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


RE: [Haskell-cafe] A tale of three shootout entries

2007-11-28 Thread Simon Peyton-Jones
| There may well have been changes to the strictness analyser that make
| some of the bangs (or most) unnecessary now. Also, its very likely
| I didn't check all combinations of strict and lazy arguments for the
| optimal evaluation strategy :)
|
| If it seems to be running consitently faster (and producing better Core
| code), by all means submit. I don't think this is a ghc bug or anything
| like that though: just overuse of bangs, leading to unnecessary work.

You might think that unnecessary bangs shouldn't lead to unnecessary work -- if 
GHC knows it's strict *and* you bang the argument, it should still only be 
evaluated once. But it can happen.  Consider

f !xs = length xs

Even though 'length' will evaluate its argument, f nevertheless evaluates it 
too.  Bangs say evaluate it now, like seq, because we may be trying to 
control space usage.  In this particular case it's silly, because the *first* 
thing length does is evaluate its argument, but that's not true of every strict 
function.

That's why I say it'd be good to have well-characterised examples.  It *may* be 
something like what I describe. Or it may be a silly omission somewhere.

Simon
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] A tale of three shootout entries

2007-11-28 Thread Kalman Noel
Simon Peyton-Jones wrote:
 You might think that unnecessary bangs shouldn't lead to unnecessary work --
 if GHC knows it's strict *and* you bang the argument, it should still only be
 evaluated once. But it can happen.  Consider
 
 f !xs = length xs
 
 Even though 'length' will evaluate its argument, f nevertheless evaluates it
 too.

I'm replying to a guru here, so I should keep my voice low; but I'd like to
point out what might result in a misunderstanding for other readers of
haskell-cafe. Contrasting both the bang pattern and the usage of length causing
f to be strict, one might suppose that a bang pattern, when used on a list, will
cause it to be evaluated in the same way as length does. However,

 the *first* thing length does is evaluate its argument,

and it will furthermore evaluate the argument list recursively, as much as is
necessary to determine its length. On the other hand, given

g !xs = ()

evaluating g [0..] will terminate, because g is only strict in the constructor
of its argument, which is (:). The list data type itself, however, is
non-strict.

Kalman

--
Free pop3 email with a spam filter.
http://www.bluebottle.com/tag/5

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] A tale of three shootout entries

2007-11-27 Thread Richard Kelsall

Sterling Clover wrote:
...

Finally, there's fasta.

http://shootout.alioth.debian.org/gp4/benchmark.php?test=fastalang=ghcid=2 

This one really depresses me. It outperforms the previous version by 
roughly 20% on my machine (PPC) but underperforms by roughly the same 
amount on the shootout box.  

...

Well done. Great, I'll have a play with your new version of Fasta, I've
just upgraded to an Intel Core 2 Duo 2.33GHz.

Something I found with Dons version on my machine was that if I removed
all the exclamation marks and the -fbang-patterns bit at the top it went
about 20% faster as well as being much cleaner code, but with my very
rudimentary understanding of Haskell I wasn't entirely sure it would
produce the same results if I did this and didn't get round to checking.

I suspect the majority of Fasta time is spent in the rand routine. My
wild cunning plan for this was that it might be possible to avoid doing
the conversion to a float every time the routine is called. My thinking
is that it could just return an Int most of the time because the number
is only used in, I think, a less-than comparison outside rand which
could almost always be decided by an Int less-than comparison rather
than a float less-than comparison. A clever lazy less-than mechanism
could get rand to give a float when the Int comparison is too close to
be certain. Probably be classified by the shootout as cheating though.
And it's way beyond what I could currently write in Haskell.


Richard.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


RE: [Haskell-cafe] A tale of three shootout entries

2007-11-27 Thread Simon Peyton-Jones
| Something I found with Dons version on my machine was that if I removed
| all the exclamation marks and the -fbang-patterns bit at the top it went
| about 20% faster as well as being much cleaner code, but with my very
| rudimentary understanding of Haskell I wasn't entirely sure it would
| produce the same results if I did this and didn't get round to checking.

If, after investigation (and perhaps checking with Don) you find that adding 
bangs makes your program go slower, even though the function is in fact strict 
(otherwise it might go slower because it's just doing more work!) then I'd love 
to see a test case.

Simon
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] A tale of three shootout entries

2007-11-27 Thread Adrian Hey

Simon Peyton-Jones wrote:

| Something I found with Dons version on my machine was that if I removed
| all the exclamation marks and the -fbang-patterns bit at the top it went
| about 20% faster as well as being much cleaner code, but with my very
| rudimentary understanding of Haskell I wasn't entirely sure it would
| produce the same results if I did this and didn't get round to checking.

If, after investigation (and perhaps checking with Don) you find that adding 
bangs makes your program go slower, even though the function is in fact strict 
(otherwise it might go slower because it's just doing more work!) then I'd love 
to see a test case.


I wonder if this could be related to what I observed with AVL trees and
mentioned a while back (using a strict data type is slower than using
explicit seqs to get the same strictness).

Regards
--
Adrian Hey




___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] A tale of three shootout entries

2007-11-27 Thread Richard Kelsall

Simon Peyton-Jones wrote:

| Something I found with Dons version on my machine was that if I removed
| all the exclamation marks and the -fbang-patterns bit at the top it went
| about 20% faster as well as being much cleaner code, but with my very
| rudimentary understanding of Haskell I wasn't entirely sure it would
| produce the same results if I did this and didn't get round to checking.

If, after investigation (and perhaps checking with Don) you find that adding 
bangs makes your program go slower, even though the function is in fact strict 
(otherwise it might go slower because it's just doing more work!) then I'd love 
to see a test case.


Sorry, I don't understand the code, I've jumped in the deep-end before
learning to swim, but I can now tell you it's producing the same results
when I remove some of the exclamation marks. I've checked with an MD5 on
the output.

The timings in seconds for 10,000,000 iterations averaged over 5 runs.
(There was quite a bit of variation.) Compiled with GHC 6.6.1. (I got
stuck compiling it under 6.8) The fancy compile options are from the
shootout page.

Dons original program  13.26compiled ghc --make
Dons original program  12.54compiled with -O -fglasgow-exts
   -optc-mfpmath=sse -optc-msse2
   -optc-march=pentium4
Removed 3 bangs from rand  11.47compiled ghc --make
Removed 3 bangs from rand  11.57compiled with -O -fglasgow-exts
   -optc-mfpmath=sse -optc-msse2
   -optc-march=pentium4

The code below is Dons program from

http://shootout.alioth.debian.org/gp4/benchmark.php?test=fastalang=ghcid=0

with a timing function added by me. The rand function is where I removed
three exclamation marks to make the program faster. Previously I removed
different combinations of bangs. Some bangs seem to make it faster and
some seem to make it slower.

Richard.



--
{-# OPTIONS -O2 -optc-O2 -optc-ffast-math -fbang-patterns 
-fexcess-precision #-}

--
-- The Computer Language Benchmarks Game
-- http://shootout.alioth.debian.org/
--
-- Contributed by Don Stewart
-- A lazy bytestring solution.
--
-- Add:
-- -optc-mfpmath=sse -optc-msse2
--

import System
import Data.Word
import Control.Arrow

import Text.Printf -- RK added.
import System.CPUTime  -- RK added.

import qualified Data.ByteString.Lazy as L
import qualified Data.ByteString.Lazy.Char8 as C (pack,unfoldr)
import qualified Data.ByteString as S
import Data.ByteString.Base


-- RK added this time function.
time :: IO t - IO t
time a = do
start - getCPUTime
v - a
end   - getCPUTime
let diff = (fromIntegral (end - start)) / (10 ^12)
printf Calc time %0.3f \n (diff :: Double)
return v


main = do -- RK modified main to time the computation.
time $ comp   -- RK mod.

comp :: IO Int-- RK mod.
comp = do -- RK mod. This was Dons main. I just renamed to comp.
n - getArgs = readIO . head
writeFasta  ONE   Homo sapiens alu   (n*2) (L.cycle alu)
g - unfold TWO   IUB ambiguity codes(n*3) (look iubs) 42
unfold  THREE Homo sapiens frequency (n*5) (look homs) g


--
-- lazily unfold the randomised dna sequences
--

unfold l t n f !g = putStrLn ( ++ l ++   ++ t)  unroll f g n

unroll :: (Int - (Word8, Int)) - Int - Int - IO Int
unroll f = loop
where
loop r 0   = return r
loop !r !i = case S.unfoldrN m (Just . f) r of
(!s, Just r') - do
S.putStrLn s
loop r' (i-m)
  where m = min i 60

look ds !k = let (d,j) = rand k in (choose ds d, j)

choose :: [(Word8,Float)] - Float - Word8
choose [(b,_)]   _ = b
choose ((!b,!f):xs) !p = if p  f then b else choose xs (p-f)


--
-- only demand as much of the infinite sequence as we require

writeFasta label title n s = do
 putStrLn $  ++ label ++   ++ title
 let (t:ts) = L.toChunks s
 go ts t n
  where
 go ss !s !n
| l60  n60 = S.putStrLn lgo ssr (n-60)
|n60 = S.putStr s  S.putStrLn a  go (tail ss) b (n-60)
| n = ln= S.putStrLn (S.take n s)
| otherwise  = S.putStr s  S.putStrLn (S.take (n-ln) (head ss))
where
!ln   = S.length s
!l60  = ln = 60
!n60  = n  = 60
(l,r) = S.splitAt 60 s
(a,b) = S.splitAt (60-ln) (head ss)



im  = 139968
ia  = 3877
ic  = 29573

rand :: Int - (Float, Int)
rand seed = (newran,newseed) -- RK modified. Was !seed
where
newseed = (seed * ia + ic) `rem` im  -- RK mod. Was 

Re: [Haskell-cafe] A tale of three shootout entries

2007-11-27 Thread Don Stewart
r.kelsall:
 Simon Peyton-Jones wrote:
 | Something I found with Dons version on my machine was that if I removed
 | all the exclamation marks and the -fbang-patterns bit at the top it went
 | about 20% faster as well as being much cleaner code, but with my very
 | rudimentary understanding of Haskell I wasn't entirely sure it would
 | produce the same results if I did this and didn't get round to checking.
 
 If, after investigation (and perhaps checking with Don) you find that 
 adding bangs makes your program go slower, even though the function is in 
 fact strict (otherwise it might go slower because it's just doing more 
 work!) then I'd love to see a test case.
 
 Sorry, I don't understand the code, I've jumped in the deep-end before
 learning to swim, but I can now tell you it's producing the same results
 when I remove some of the exclamation marks. I've checked with an MD5 on
 the output.
 
 The timings in seconds for 10,000,000 iterations averaged over 5 runs.
 (There was quite a bit of variation.) Compiled with GHC 6.6.1. (I got
 stuck compiling it under 6.8) The fancy compile options are from the
 shootout page.
 
 Dons original program  13.26compiled ghc --make
 Dons original program  12.54compiled with -O -fglasgow-exts
-optc-mfpmath=sse -optc-msse2
-optc-march=pentium4
 Removed 3 bangs from rand  11.47compiled ghc --make
 Removed 3 bangs from rand  11.57compiled with -O -fglasgow-exts
-optc-mfpmath=sse -optc-msse2
-optc-march=pentium4
 
 The code below is Dons program from
 
 http://shootout.alioth.debian.org/gp4/benchmark.php?test=fastalang=ghcid=0
 
 with a timing function added by me. The rand function is where I removed
 three exclamation marks to make the program faster. Previously I removed
 different combinations of bangs. Some bangs seem to make it faster and
 some seem to make it slower.

There may well have been changes to the strictness analyser that make
some of the bangs (or most) unnecessary now. Also, its very likely 
I didn't check all combinations of strict and lazy arguments for the
optimal evaluation strategy :)

If it seems to be running consitently faster (and producing better Core
code), by all means submit. I don't think this is a ghc bug or anything
like that though: just overuse of bangs, leading to unnecessary work.

-- Don
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] A tale of three shootout entries

2007-11-27 Thread Ryan Dickie
Oops forgot to hit reply-to-all.. resending..

N-body is looking good. I am running and amd64 3000+ on ghc 6.8.1.  The
debian shootout is showing a huge gap between ghc 6.6 and g++ but I am not
seeing that gap.  One concern though is that the code doesn't look very
haskellish. So much pointer manip.

For the nbody c++ code I am getting:
-0.169075164
-0.169031665

real0m11.168s
user0m10.891s
sys 0m0.043s

and for the nbody haskell code I am getting:
-0.169075164
-0.169031665

real0m11.595s
user0m11.422s
sys 0m0.044s


On Nov 26, 2007 8:21 PM, Don Stewart [EMAIL PROTECTED] wrote:

 s.clover:
  In some spare time over the holidays I cooked up three shootout
  entries, for Fasta, the Meteor Contest, and Reverse Complement. I

 Yay!

  First up is the meteor-contest entry.
 
  http://shootout.alioth.debian.org/gp4/benchmark.php?
  test=meteorlang=ghcid=5
 
  This is the clear win of the bunch, with significantly improved time
  thanks to its translation of the better algorithm from Clean.

 Well done! Though looks like we'll have to follow the C++ implementation
 to be really competitive.

  Next is reverse-complement.
 
  http://shootout.alioth.debian.org/gp4/benchmark.php ?
  test=revcomplang=ghcid=3

 Very good. I'm glad someone looked at that, since the old code was
 moderately naive (first bytestring effort).

  Finally, there's fasta.
 
  http://shootout.alioth.debian.org/gp4/benchmark.php?
  test=fastalang=ghcid=2

 Yeah, we should do something better here. Hmm.

  p.s. It looks like they've depreciated chameneos in favor of a new
  version, chameneos-redux. As this was one of the places Haskell
  really rocked the competition, it would probably be worth updating

 Definitely. I note also we're beating Erlang on the new thread-ring
 benchmark too,


 http://shootout.alioth.debian.org/gp4/benchmark.php?test=threadringlang=all

  the Haskell entry for the new benchmark. Also, the n-bodies benchmark
  seems like another that could be much improved.

 Yeah, that's a hard one.

 -- Don
 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] A tale of three shootout entries

2007-11-27 Thread Ryan Dickie
Never mind. I screwed up the timings.
The new haskell timings are still a huge improvement but they are:

-0.169075164
-0.169031665

real0m27.196s
user0m19.688s
sys 0m0.163s


On Nov 27, 2007 11:25 AM, Ryan Dickie [EMAIL PROTECTED] wrote:

 Oops forgot to hit reply-to-all.. resending..


 N-body is looking good. I am running and amd64 3000+ on ghc 6.8.1.  The
 debian shootout is showing a huge gap between ghc 6.6 and g++ but I am not
 seeing that gap.  One concern though is that the code doesn't look very
 haskellish. So much pointer manip.

 For the nbody c++ code I am getting:
 -0.169075164
 -0.169031665

 real0m11.168s
 user0m10.891s
 sys 0m0.043s

 and for the nbody haskell code I am getting:
 -0.169075164
 -0.169031665

 real0m11.595s
 user0m11.422s
 sys 0m0.044s


 On Nov 26, 2007 8:21 PM, Don Stewart [EMAIL PROTECTED] wrote:

  s.clover:
   In some spare time over the holidays I cooked up three shootout
   entries, for Fasta, the Meteor Contest, and Reverse Complement. I
 
  Yay!
 
   First up is the meteor-contest entry.
  
   http://shootout.alioth.debian.org/gp4/benchmark.php?
   test=meteorlang=ghcid=5
  
   This is the clear win of the bunch, with significantly improved time
   thanks to its translation of the better algorithm from Clean.
 
  Well done! Though looks like we'll have to follow the C++ implementation
 
  to be really competitive.
 
   Next is reverse-complement.
  
   http://shootout.alioth.debian.org/gp4/benchmark.php ?
   test=revcomplang=ghcid=3
 
  Very good. I'm glad someone looked at that, since the old code was
  moderately naive (first bytestring effort).
 
   Finally, there's fasta.
  
   http://shootout.alioth.debian.org/gp4/benchmark.php?
   test=fastalang=ghcid=2
 
  Yeah, we should do something better here. Hmm.
 
   p.s. It looks like they've depreciated chameneos in favor of a new
   version, chameneos-redux. As this was one of the places Haskell
   really rocked the competition, it would probably be worth updating
 
  Definitely. I note also we're beating Erlang on the new thread-ring
  benchmark too,
 
 
  http://shootout.alioth.debian.org/gp4/benchmark.php?test=threadringlang=all
 
   the Haskell entry for the new benchmark. Also, the n-bodies benchmark
   seems like another that could be much improved.
 
  Yeah, that's a hard one.
 
  -- Don
  ___
  Haskell-Cafe mailing list
  Haskell-Cafe@haskell.org
  http://www.haskell.org/mailman/listinfo/haskell-cafe
 



___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] A tale of three shootout entries

2007-11-27 Thread Richard Kelsall

Don Stewart wrote:
...


There may well have been changes to the strictness analyser that make
some of the bangs (or most) unnecessary now. Also, its very likely 
I didn't check all combinations of strict and lazy arguments for the

optimal evaluation strategy :)


I suspect the optimum details will change again when we get to GHC 6.8.
Yes, I got bored trying different combinations too. A genetic algorithm
that knocks out different combinations might be fun. The ones in rand
seem to make the most difference and I decided the code was easier to
read if I took them all out.


If it seems to be running consitently faster (and producing better Core
code), by all means submit. I don't think this is a ghc bug or anything
like that though: just overuse of bangs, leading to unnecessary work.

-- Don



It was consistently faster on my machine, but it would be interesting
to compare with the run-times on Sterling's PPC machine. I'll have a
play with Sterling's new program and report.


Richard.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] A tale of three shootout entries

2007-11-27 Thread Sterling Clover
I tried the same thing on my box, and indeed the version that isn't  
strict in the rand function outperforms the original by a fair  
margin, and seems to do slightly better than my own as well. Killing  
the bangs in the unroll function also seems to help (especially that  
in (s!, Just r')). Why this is is slightly beyond me at the moment.  
Killing the bang before the b in the choose function also adds a  
speedup, which makes perfect sense, as there's no reason to force  
strictness on an argument you're throwing away a good span of the  
time. The bang before the k in the look function should stay -- in  
fact, it seems the appropriate place to force the evaluation that we  
were forcing too early in some of the other functions. Ditto the bang  
before the g in unfold. As for the bangs in writeFasta, better to  
leave them be and not risk messing things up, since, as is, the  
writeFasta function uses nearly no cycles compared to random generation.


At this point, given that lazier random generation seems to be  
better, using unboxed types for this seems a losing idea, as they'd  
force strictness all over again, so that's not worth trying to salvage.


I'm still curious if the pre-calculation of partial sums that I did  
works well across processors, as I don't see why it shouldn't. My  
less-strictified version of Don's code is attached, and below are the  
functions you'll need to insert/replace to make the partial-sums  
optimization work.


Regards,
Sterl

P.S., if you're running on a unix, I find it much more convenient to  
use the time program rather than rolling timing into my own code. I  
tested this program using, for example  time ./fastaRefUnStrict  
25 | tail




fastaRefUnStrict.hs
Description: Binary data


--

Code for partial sums:

choose :: [(Word8,Float)] - Float - Word8
choose [(b,_)]   _ = b
choose ((b,f):xs) p = if p  f then b else choose xs p

makeCumul :: [(Word8,Float)]-[(Word8,Float)]
makeCumul freqMap = tail . reverse . foldl' fm [(undefined,0)] $ freqMap
where fm acc@((_,ct):rst) (w,f) = (w,ct + f) : acc

iubs :: [(Word8,Float)]
iubs = makeCumul $ map (first c2w)
[('a',0.27),('c',0.12),('g',0.12),('t',0.27),('B',0.02)
,('D',0.02),('H',0.02),('K',0.02),('M',0.02),('N',0.02)
,('R',0.02),('S',0.02),('V',0.02),('W',0.02),('Y',0.02)]

homs :: [(Word8,Float)]
homs = makeCumul $ map (first c2w)
[('a',0.3029549426680),('c',0.1979883004921)
,('g',0.1975473066391),('t',0.3015094502008)]

On Nov 27, 2007, at 2:09 PM, Richard Kelsall wrote:


Simon Peyton-Jones wrote:
| Something I found with Dons version on my machine was that if I  
removed
| all the exclamation marks and the -fbang-patterns bit at the top  
it went
| about 20% faster as well as being much cleaner code, but with my  
very
| rudimentary understanding of Haskell I wasn't entirely sure it  
would
| produce the same results if I did this and didn't get round to  
checking.
If, after investigation (and perhaps checking with Don) you find  
that adding bangs makes your program go slower, even though the  
function is in fact strict (otherwise it might go slower because  
it's just doing more work!) then I'd love to see a test case.


Sorry, I don't understand the code, I've jumped in the deep-end before
learning to swim, but I can now tell you it's producing the same  
results
when I remove some of the exclamation marks. I've checked with an  
MD5 on

the output.

The timings in seconds for 10,000,000 iterations averaged over 5 runs.
(There was quite a bit of variation.) Compiled with GHC 6.6.1. (I got
stuck compiling it under 6.8) The fancy compile options are from the
shootout page.

Dons original program  13.26compiled ghc --make
Dons original program  12.54compiled with -O -fglasgow-exts
   -optc-mfpmath=sse -optc-msse2
   -optc-march=pentium4
Removed 3 bangs from rand  11.47compiled ghc --make
Removed 3 bangs from rand  11.57compiled with -O -fglasgow-exts
   -optc-mfpmath=sse -optc-msse2
   -optc-march=pentium4

The code below is Dons program from

http://shootout.alioth.debian.org/gp4/benchmark.php? 
test=fastalang=ghcid=0


with a timing function added by me. The rand function is where I  
removed
three exclamation marks to make the program faster. Previously I  
removed

different combinations of bangs. Some bangs seem to make it faster and
some seem to make it slower.

Richard.



--
{-# OPTIONS -O2 -optc-O2 -optc-ffast-math -fbang-patterns -fexcess- 
precision #-}

--
-- The Computer Language Benchmarks Game
-- http://shootout.alioth.debian.org/
--
-- Contributed by Don Stewart
-- A lazy bytestring solution.
--
-- Add:
-- -optc-mfpmath=sse -optc-msse2
--

import System
import Data.Word
import Control.Arrow

import Text.Printf  

[Haskell-cafe] A tale of three shootout entries

2007-11-26 Thread Sterling Clover
In some spare time over the holidays I cooked up three shootout  
entries, for Fasta, the Meteor Contest, and Reverse Complement. I  
should probably have tossed them to haskell-cafe before submission,  
for review and ideas, but they're up now. In any case, all three were  
great learning experiences, but could use some other eyes and ideas  
to be the best that they can.


First up is the meteor-contest entry.

http://shootout.alioth.debian.org/gp4/benchmark.php? 
test=meteorlang=ghcid=5


This is the clear win of the bunch, with significantly improved time  
thanks to its translation of the better algorithm from Clean.  
However, it's still missing something. Nearly all its time is spent  
in a tight loop in the solveCell function near the end of the code. I  
tried unboxing this, but failed because it spends the bulk of its  
time applying a bitwise and between the recursively passed value and  
a piece of data retrieved from the masksAtCell data structure which  
is of type :: Array (Row,Col) (Array Color [Mask]). (Note that Mask,  
Color, Row and Col are all type synonyms for Int that I added for  
readability).  As each Mask is stored in this list, the masks can't  
easily be unboxed -- some sort of custom data structure built using  
the FFI seems in order here. If anyone wants to tackle this, I think  
it could be a big win for performance.


Next is reverse-complement.

http://shootout.alioth.debian.org/gp4/benchmark.php? 
test=revcomplang=ghcid=3


This *would* be a big win except I dimly doubled memory usage from  
the previous entry due to filtering newlines explicitly -- which  
does, one should note, provide a large performance gain. The solution  
here seems the most obvious -- roll the newline stripping into the  
destructive modifications performed in the revcomp function, as the  
winning C++ entry does. I'll probably get around to this eventually,  
but if someone else wants to try to implement this or any other  
performance improvements, please jump right in. Additionally, there  
might be some other tricks to reducing its memory usage that escape  
me at the moment (noting, of course, that using a strict bytestring,  
as we should, its unavoidable that we consume the entire contents of  
the input at once... I think?)


Finally, there's fasta.

http://shootout.alioth.debian.org/gp4/benchmark.php? 
test=fastalang=ghcid=2


This one really depresses me. It outperforms the previous version by  
roughly 20% on my machine (PPC) but underperforms by roughly the same  
amount on the shootout box. If you compare it to dons previous  
version, the optimizations I attempted should be pretty obvious.  
First, I precompute the partial sums for the frequency tables and  
alter the choose function accordingly. This is a pretty basic measure  
that all the better entries seem to do. Next, I unboxed the random  
function, which yielded big speedups However, given that we use an  
unfoldN, as we should, I couldn't very well pass it a function that  
returned something of kind #, (especially as Maybe, which unfoldN  
uses, is of kind *-*). Thus, I hid the unrolled loop in a lazy list  
of floats that is passed instead of a random seed. I suspect that for  
some reason I don't understand relating to differences in processors,  
GHC's internal handling of floating point math, or who knows what,  
this somehow is the source of the slowdown. If someone with an Intel  
Pentium 4 machine comparable to that of the shootout box wants to  
take a look at this code and see why it underperforms, I'd be much  
obliged. It really seems to me that GHC's fasta performance is far  
below where it should be (4x slower than Java!) , and I'd like to get  
its numbers up somehow.


Thanks,
Sterl

p.s. It looks like they've depreciated chameneos in favor of a new  
version, chameneos-redux. As this was one of the places Haskell  
really rocked the competition, it would probably be worth updating  
the Haskell entry for the new benchmark. Also, the n-bodies benchmark  
seems like another that could be much improved.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] A tale of three shootout entries

2007-11-26 Thread Don Stewart
s.clover:
 In some spare time over the holidays I cooked up three shootout  
 entries, for Fasta, the Meteor Contest, and Reverse Complement. I  

Yay!

 First up is the meteor-contest entry.
 
 http://shootout.alioth.debian.org/gp4/benchmark.php? 
 test=meteorlang=ghcid=5
 
 This is the clear win of the bunch, with significantly improved time  
 thanks to its translation of the better algorithm from Clean.  

Well done! Though looks like we'll have to follow the C++ implementation 
to be really competitive.

 Next is reverse-complement.
 
 http://shootout.alioth.debian.org/gp4/benchmark.php? 
 test=revcomplang=ghcid=3

Very good. I'm glad someone looked at that, since the old code was
moderately naive (first bytestring effort).

 Finally, there's fasta.
 
 http://shootout.alioth.debian.org/gp4/benchmark.php? 
 test=fastalang=ghcid=2
  
Yeah, we should do something better here. Hmm.
  
 p.s. It looks like they've depreciated chameneos in favor of a new  
 version, chameneos-redux. As this was one of the places Haskell  
 really rocked the competition, it would probably be worth updating  

Definitely. I note also we're beating Erlang on the new thread-ring 
benchmark too,

http://shootout.alioth.debian.org/gp4/benchmark.php?test=threadringlang=all

 the Haskell entry for the new benchmark. Also, the n-bodies benchmark  
 seems like another that could be much improved.

Yeah, that's a hard one.

-- Don
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe